https://drive.google.com/file/d/1q44Ky0wavt93sxeyHfxkHSftkGeeNlm6
https://github.com/nsodhi-cmu/lock-free_segtrees
Neal and Myles both worked largely on both all parts of the project. Neal focused slightly more on the fine-grained implementation, the pointer logic, and the lock-free range query and range update functions. Myles focused slightly more on the atomic primitives, lock-free stack, and the test case suite. In terms of the paper, Neal wrote more of the background and implementation and Myles wrote more of the graphs and results.
Currently, we have implemented coarse-grained lock-based segment trees and fine-grained lock-based segment trees. We have also implemented lazy propagation for both, enabling efficient range updates. To elaborate, we have structured our implementation using inheritance, with abstract classes and subclasses that allow us to easily swap between segment tree variants for performance comparisons.
For the fine-grained lock-based segment tree, we explored both DFS-based and BFS-based approaches. Ultimately, we chose the BFS-based approach because it processes queries layer by layer, offering a potential O(log(n))
speedup. Additionally, we added the ability for users to pass their own associative and commutative functions, along with a base value and batch function. This enhancement supports testing more complex workflows.
So far, we have focused on verifying correctness with simple test cases, supported by a correctness-checking friend class. While testing, we experimented with both Cilk and pthreads, ultimately deciding against the new Coroutines feature in C++23 due to performance issues. Currently, we are working on the lock-free segment tree. Our approach involves using queues for each node with a “trickle-down” parallelism strategy, which shows great promise.
We have deviated slightly from our original plan. Initially, allowing users to provide custom associative and commutative functions was considered a stretch goal. However, we prioritized this feature because it enables testing of more complex workflows. Consequently, we have focused less on creating diverse workflows and more on ensuring the correctness of user-provided associative functions, such as addition, min, max, and multiplication. Our current stretch goal is to test and analyze performance on the PSC machines.
Week | Plan |
---|---|
Dec. 2 | Finish implementing lock-free segment trees. Continue creating workflow test cases. |
Dec. 9 | Complete workflow test cases. Test results on the GHC machines, and if possible, the PSC machines. Work on the final report. |
For the poster session, we will visually showcase the performance characteristics of three synchronization methods for concurrent segment trees: coarse-grained locking, fine-grained locking, and lock-free. Our graphs will compare metrics such as throughput, latency, and scalability (e.g., speedup with increasing threads) for each implementation. Additionally, we will include comparative charts showing performance across various workload types, such as query-heavy vs. update-heavy scenarios.
Key trade-offs will be highlighted, including how coarse-grained locking is simpler but performs poorly under contention, while fine-grained and lock-free approaches offer better performance but are more complex. While a live demo is an option, we believe that focusing on broad performance trends will provide more meaningful insights.
Currently, we do not have any graphs or performance results, as our efforts have been focused on correctness rather than speed comparisons of various workflows.
We have not faced any major issues so far. Initially, we planned to use C++23 coroutines but found performance limitations, leading us to switch to pthreads and Cilk. Additionally, while developing the fine-grained lock-based tree, we revised our implementation to include a “batch function” for applying user-defined associative functions to multiple inputs at once, improving performance and flexibility.
https://docs.google.com/document/d/10HlfDoYp-oIcfUaGHnBqaWVo2FK7PKdBONChBe72MeU
We implement a coarse-grained lock-based, fine-grained lock-based, and lock-free segment tree data structure along with the lazy propagation algorithm. We then compare their performance on various real-life applications/workloads of segment trees.
A segment tree is a data structure commonly used in algorithms for efficiently performing range queries and updates on arrays. Typical applications include range sum queries, range minimum or maximum queries, and dynamic interval updates. Traditional implementations rely on a mutable array or tree structure, often protected by locks when used in a multithreaded environment. However, locking can introduce contention and overhead, limiting performance in scenarios with high-frequency updates and queries.
This project focuses on designing a lock-free segment tree with lazy propagation that allows multiple threads to perform queries and updates (including range updates) without locking. The lock-free segment tree will be implemented with atomic operations, ensuring that multiple threads can make progress without blocking each other. One large area for parallelization is allowing queries/updates on non-overlapping segments of the segment tree to execute concurrently. We aim to first focus on addition, and if time permits, we will investigate further associative operations.
When implementing a segment tree, after updating a leaf, all of the ancestors that store information about the ranges that contain this leaf must also be updated. In a not thread-safe implementation, if a query happens while an update happens, this could lead to inconsistent results, as some of the ancestors may have been updated but not all.
Furthermore, the lazy propagation algorithm for updating a range of values requires a separate "lazy" array or structure to store updates that need to be applied later. When eventually applying them, we also must ensure that concurrent operations do not cause each other to produce inconsistent results.
We plan on implementing the data structures from scratch in C++. We plan on using the GHC machines for testing, and, if time permits, the PSC machines. The Cilk and Pthreads documentation will be helpful since we plan on using a fork-join programming model to create parallel test cases. If it comes up, we may look to papers that implement similar data structures without locks for inspiration to our own lock-free segment tree.