Lock-Free Segment Tree and Lazy Propagation

Final Paper Link

https://drive.google.com/file/d/1q44Ky0wavt93sxeyHfxkHSftkGeeNlm6

Repository Link

https://github.com/nsodhi-cmu/lock-free_segtrees

Credits

Neal and Myles both worked largely on both all parts of the project. Neal focused slightly more on the fine-grained implementation, the pointer logic, and the lock-free range query and range update functions. Myles focused slightly more on the atomic primitives, lock-free stack, and the test case suite. In terms of the paper, Neal wrote more of the background and implementation and Myles wrote more of the graphs and results.

---

Progress Summary

Currently, we have implemented coarse-grained lock-based segment trees and fine-grained lock-based segment trees. We have also implemented lazy propagation for both, enabling efficient range updates. To elaborate, we have structured our implementation using inheritance, with abstract classes and subclasses that allow us to easily swap between segment tree variants for performance comparisons.

For the fine-grained lock-based segment tree, we explored both DFS-based and BFS-based approaches. Ultimately, we chose the BFS-based approach because it processes queries layer by layer, offering a potential O(log(n)) speedup. Additionally, we added the ability for users to pass their own associative and commutative functions, along with a base value and batch function. This enhancement supports testing more complex workflows.

So far, we have focused on verifying correctness with simple test cases, supported by a correctness-checking friend class. While testing, we experimented with both Cilk and pthreads, ultimately deciding against the new Coroutines feature in C++23 due to performance issues. Currently, we are working on the lock-free segment tree. Our approach involves using queues for each node with a �trickle-down� parallelism strategy, which shows great promise.

Goal Updates

We have deviated slightly from our original plan. Initially, allowing users to provide custom associative and commutative functions was considered a stretch goal. However, we prioritized this feature because it enables testing of more complex workflows. Consequently, we have focused less on creating diverse workflows and more on ensuring the correctness of user-provided associative functions, such as addition, min, max, and multiplication. Our current stretch goal is to test and analyze performance on the PSC machines.

Updated Schedule

Week	Plan
Dec. 2	Finish implementing lock-free segment trees. Continue creating workflow test cases.
Dec. 9	Complete workflow test cases. Test results on the GHC machines, and if possible, the PSC machines. Work on the final report.

Poster Session Updates

For the poster session, we will visually showcase the performance characteristics of three synchronization methods for concurrent segment trees: coarse-grained locking, fine-grained locking, and lock-free. Our graphs will compare metrics such as throughput, latency, and scalability (e.g., speedup with increasing threads) for each implementation. Additionally, we will include comparative charts showing performance across various workload types, such as query-heavy vs. update-heavy scenarios.

Key trade-offs will be highlighted, including how coarse-grained locking is simpler but performs poorly under contention, while fine-grained and lock-free approaches offer better performance but are more complex. While a live demo is an option, we believe that focusing on broad performance trends will provide more meaningful insights.

Preliminary Results

Currently, we do not have any graphs or performance results, as our efforts have been focused on correctness rather than speed comparisons of various workflows.

Challenges Encountered

We have not faced any major issues so far. Initially, we planned to use C++23 coroutines but found performance limitations, leading us to switch to pthreads and Cilk. Additionally, while developing the fine-grained lock-based tree, we revised our implementation to include a �batch function� for applying user-defined associative functions to multiple inputs at once, improving performance and flexibility.

---

URL

https://docs.google.com/document/d/10HlfDoYp-oIcfUaGHnBqaWVo2FK7PKdBONChBe72MeU

Summary

We implement a coarse-grained lock-based, fine-grained lock-based, and lock-free segment tree data structure along with the lazy propagation algorithm. We then compare their performance on various real-life applications/workloads of segment trees.

Background

A segment tree is a data structure commonly used in algorithms for efficiently performing range queries and updates on arrays. Typical applications include range sum queries, range minimum or maximum queries, and dynamic interval updates. Traditional implementations rely on a mutable array or tree structure, often protected by locks when used in a multithreaded environment. However, locking can introduce contention and overhead, limiting performance in scenarios with high-frequency updates and queries.

This project focuses on designing a lock-free segment tree with lazy propagation that allows multiple threads to perform queries and updates (including range updates) without locking. The lock-free segment tree will be implemented with atomic operations, ensuring that multiple threads can make progress without blocking each other. One large area for parallelization is allowing queries/updates on non-overlapping segments of the segment tree to execute concurrently. We aim to first focus on addition, and if time permits, we will investigate further associative operations.

Challenge

When implementing a segment tree, after updating a leaf, all of the ancestors that store information about the ranges that contain this leaf must also be updated. In a not thread-safe implementation, if a query happens while an update happens, this could lead to inconsistent results, as some of the ancestors may have been updated but not all.

Furthermore, the lazy propagation algorithm for updating a range of values requires a separate "lazy" array or structure to store updates that need to be applied later. When eventually applying them, we also must ensure that concurrent operations do not cause each other to produce inconsistent results.

Resources

We plan on implementing the data structures from scratch in C++. We plan on using the GHC machines for testing, and, if time permits, the PSC machines. The Cilk and Pthreads documentation will be helpful since we plan on using a fork-join programming model to create parallel test cases. If it comes up, we may look to papers that implement similar data structures without locks for inspiration to our own lock-free segment tree.

Goals and Deliverables

Plan to Achieve:

Coarse-grained lock-based segment tree with lazy propagation
Fine-grained lock-based segment tree with lazy propagation
Lock-free segment tree with lazy propagation
Analysis of the performance of these implementations against different workloads, sizes, and parallel conditions

Hope to Achieve:

Allow the user to pass an arbitrary associative function to create a "custom" segment tree
Use PSC machines for testing