Final report: Lock-free Extendible Hash Table in Database Management System
Lock Free Extendible Hash Table In Database Management systems: Compare and Benchmark
Yicheng Zhang, Ruijie Zhai
Carnegie Mellon University, 15418 project report
Benchmark for DBMS: https://github.com/yicheng4/lock_free_htable_bustub
Development Codebase: https://github.com/ChaosZhai/lock-free-eht
SUMMARY
In this project, we innovated a lock-free extendible hash table for database management systems, addressing the limitations of traditional locking mechanisms. Our algorithm merges a lock-free linked list with an extendable pointer array and introduces dummy nodes for seamless, lock-free operations. Benchmarking against coarse-locked and fine-grained-locked counterparts using the BusTub DBMS (https://github.com/cmu-db/bustub) under OLAP and OLTP workloads, the lock-free version demonstrated superior throughput, achieving up to 239x speedup for reads and 28x for writes in OLAP, and 79x for reads and 12x for writes in OLTP scenarios. This marked improvement highlights the lock-free method’s potential to significantly enhance DBMS performance in high-concurrency environments.
BACKGROUND
Database Management Systems (DBMS) are crucial for storing, retrieving, and managing data efficiently. They support various workloads, notably Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). OLTP systems are optimized for transaction-oriented tasks that typically require speedy query processing and involve numerous short transactions like inserting, deleting, or updating small amounts of data. In contrast, OLAP systems are designed for analysis and reporting functions, handling complex queries that read large volumes of data.
The performance of a DBMS is highly dependent on its ability to manage data access efficiently, especially under different mixes of read and write operations. As such, the ability of a system to scale with varying counts of readers and writers without a significant drop in performance is vital.
Dependencies arise when multiple operations need to access or modify the same data simultaneously. In terms of parallelism, there’s a high degree of data-parallelism in DBMS operations as many independent read and write operations can occur simultaneously, provided they are not on the same data item. There is also a locality of reference, as certain data items may be accessed more frequently than others, and access patterns may be predictable to some extent.
An extendible hash table (ETH) is a dynamic data structure that plays a critical role in this regard. It allows a hash table to grow incrementally, which is particularly beneficial in environments where the dataset size changes over time. ETHs can handle frequent insertions and deletions while maintaining efficient search capabilities. This adaptability makes ETH an important element in both OLTP and OLAP systems where it can potentially optimize data retrieval and modification operations.
However, traditional ETH implementations often use locking mechanisms to ensure data integrity under concurrent access, which can lead to bottlenecks as the number of concurrent operations increases. A lock-free ETH promises to mitigate these issues, offering a way to improve throughput and scalability in multi-threaded DBMS environments. This is especially critical as DBMS workloads become more concurrent with varied reader and writer counts, necessitating data structures that can keep pace with the demand for fast, concurrent data access without compromising consistency.
Challenges for Lock Free Extendible Hashing
Lock-free extendible hashing is challenging due to the complexities involved in resizing the hash table. This operation necessitates relocating several items from old buckets to new ones, which traditionally involves locking to maintain data integrity. However, lock-free implementations aim to allow these operations to happen concurrently without locks.
The difficulty with lock-free hashing arises during the resizing process. The atomicity required to move elements between buckets is not achievable with a single Compare-And-Swap (CAS) operation. An atomic move operation must remove an item from one linked list and insert it into another without losing the element or introducing duplicates. Ensuring this without locks requires complex coordination between threads.
Lock-free techniques need to provide the broader atomicity that enables operations to be completed without traditional locks. This often means that threads must “help” one another to complete operations that they did not initiate. “Helping” introduces its own set of challenges: it requires threads to maintain state information and monitor the progress of other operations, leading to potential redundancies and increased overhead.
The core problem is to achieve this broader atomicity while avoiding the pitfalls of “replication management” and maintaining the constant time performance expected of hashing algorithms. This requires a sophisticated balance between independence of operations to maximize parallelism and the coordination between operations to ensure consistency and avoid performance degradation due to overheads from “helping” mechanisms.
APPROACH
Basic Extendible Hashing structure (Coarse & Fine-Grained)
Extendible hashing is an advanced technique in database management systems (DBMS) designed to enhance the efficiency of data retrieval, especially in scenarios where data grows dynamically. Unlike traditional chained hashing, where the length of the chains increases indefinitely leading to performance degradation, extendible hashing introduces a more dynamic approach to manage these chains.
The primary mechanism of extendible hashing revolves around splitting buckets, a process triggered when they reach their capacity. In conventional hashing, when a bucket is full, a new element is simply added to a chain, which can grow endlessly and slow down data access. Extendible hashing, on the other hand, allows multiple slot locations in the hash table to point to the same bucket chain. This design reduces the need for long chains and improves data access times.
A critical aspect of extendible hashing is its approach to re-balancing the hash table. This process involves moving entries within buckets during a split and increasing the number of bits examined to locate entries in the hash table. The beauty of this method lies in its efficiency: only the data within the buckets of the split chain are moved, leaving all other buckets untouched. This selective relocation minimizes the overhead associated with reorganizing the hash table.
The DBMS achieves this efficiency through the use of two key metrics: global and local depth bit counts. These counts determine the number of bits required to find buckets in the slot array, guiding the DBMS in efficiently locating and accessing data. When a bucket becomes full, the DBMS undertakes the task of splitting this bucket and reshuffling its elements. This process is where the local and global depth counts come into play.
If the local depth of the split bucket is less than the global depth, the DBMS simply adds the new bucket to the existing slot array, making this a relatively straightforward process. However, if the local depth is equal to the global depth, the DBMS must undertake a more complex operation. It doubles the size of the slot array to make room for the new bucket and increments the global depth counter. This expansion allows the hash table to accommodate the growing data while maintaining efficient access times.
xample of Basic Extendible Hash Table*
Source: https://15445.courses.cs.cmu.edu/fall2023/
As an example, the global depth has a value of 2. This global depth represents the number of bits used to generate an index for the hash table. With two bits, there are four potential binary combinations (00, 01, 10, 11), which correspond to the slots in the hash table. These slots are the entry points that lead to various buckets where the data is actually stored.
The buckets, depicted on the right side of the image, each have a local depth indicator. The local depth specifies how many bits have been used to hash the records in that particular bucket. In this example, we see two buckets with a local depth of 1 and one with a local depth of 2. This local depth is pivotal in understanding how the hash table responds to new data.
A notable feature of the extendible hash table is the sharing of buckets by different slots. For instance, the slots labeled 00 and 01 point to the same bucket, which has a local depth of 1. This sharing implies that for the records in this bucket, only the first bit of their hash value is considered significant. The other bits are redundant due to the bucket’s local depth.
The process of splitting buckets is central to the extendible hashing methodology. When a bucket is full, and a new record is to be inserted, the bucket must be split to accommodate this new data. The outcome of the split is governed by the bucket’s local depth relative to the global depth. If the local depth is less than the global depth, the operation is simple: records are redistributed between two buckets, and the hash table is updated so that only the relevant slots point to the new bucket.
To illustrate with an example from the image, the bucket pointed to by the 10 slot has a local depth of 2. Should this bucket reach its capacity and necessitate splitting, the hash table itself would not require expansion since the local depth does not yet match the global depth. However, if this bucket were to split again post-expansion, the local depth would become 3, exceeding the global depth and triggering a doubling of the hash table’s size and an increment of the global depth.
Lock-free Extendible Hashing Algorithm
Motivation
To tackle the CAS problem introduced in the challenges part, namely we cannot achieve an atomic move of half of a bucket elements between buckets using one CAS, this lock-free design will not move the items among the buckets, rather, it will move the buckets among the items.
More specifically, as shown in the figure above, the algorithm keeps all the items in one lock-free linked list, and gradually assigns the bucket pointers to the places in the list where a sublist of “correct” items can be found. A bucket is initialized upon first access by assigning it to a new “dummy” node (dashed contour) in the list, preceding all items that should be in that bucket. A newly created bucket splits an older bucket’s chain, reducing the access cost to its items.
Hence, unlike moving an item, the operation of directing a bucket pointer can be done in a single CAS operation, and since items are not moved, they are never “lost”. Then to make other operations work, we introduced this split-ordering algorithm.
Algorithm
The lock-free extendible hash table with split-ordering is an innovative data structure designed for efficient operation in concurrent environments, particularly on multiprocessors. It employs a unique recursive split-ordering of elements, facilitating dynamic resizing without locks. The structure grows by doubling, using dummy nodes as bucket placeholders to maintain list integrity during concurrent modifications. Segments are used for memory-efficient management of bucket pointers. This design offers O(1) complexity for lookups, inserts, and deletes, optimizing performance in high-load, multi-programmed systems.
The detailed breakdown of components of this lock-free EHT is as follows:
Recursive Split-Ordering: This is a novel way of organizing elements in a linked list to facilitate efficient ‘splitting’ during resizing. By reversing the bits of the hash key (bit-reversal), elements are kept in an order that allows for easy splitting and reorganization as the hash table grows. Recursive split-ordering ensures that elements remain adjacent in the list, simplifying access and manipulation.
Data Structure Growth & Insert Process: The hash table grows by doubling its size, starting from 2. New buckets are created and assigned to different segments of the list as the table grows. For insertion, the algorithm hashes the key to the appropriate bucket using recursive split-ordering and inserts the item in the sorted list at the correct position. The insert process involves managing the linked list and updating the bucket pointers.
Purpose of Dummy Nodes: Dummy nodes are introduced as placeholders for buckets in the linked list. They mark the start of a new bucket, simplifying the insertion process and ensuring the stability of the list structure during concurrent access and modification.
Introduction of Segments: Segments are introduced to manage the growing array of bucket pointers, making the resizing process more efficient. This approach helps in managing memory and improving performance, as only segments containing accessed buckets are allocated.
Complexity of Lookup and Insert/Delete Operations: The expected complexity for lookup, insert, and delete operations in this lock-free hash table is O(1) under the assumption of a uniform distribution of hash keys. The design ensures that these operations can be performed efficiently even in a highly concurrent environment.
Benchmark setup
We did the benchmark under BusTub (https://github.com/cmu-db/bustub) system with the ability to use the index scheme under SQL. The Hashing function is mainly used in indexing cases. We benchmarked using Apple Mac M1 Max with 64GB of memory and 8 performance cores.
This benchmarking is designed to evaluate the performance of a hash table in a database system, focusing on measuring the throughput of read and write operations.
The main components of our benchmark include several constants that define parameters like the number of threads, and the total number of keys to be used in the benchmark.
To simulate database operations, our code spawns multiple reader and writer threads. The reader threads are tasked with performing lookups in the hash table, handling keys based on predefined conditions. Meanwhile, the writer threads are responsible for insertions and deletions, again based on specific conditions under SQL like insertions, delete, updates, and hash join. Each thread is designed to measure its throughput and report it at 1 second intervals.
The benchmarking logic of the script is robust. It involves multiple threads performing reads and writes on the hash table concurrently, testing the table’s performance under various conditions and workloads. The read operations include key lookups and condition checks, whereas the write operations involve key insertions or deletions based on certain criteria. The performance is measured in terms of the number of reads and writes per second, providing a clear picture of the hash table’s efficiency under load.
Lastly, our benchmark includes comprehensive error handling for both command-line arguments and any runtime errors that might occur during the hash table operations. This makes the benchmarking tool not just efficient but also reliable for testing the performance of hash table implementations in a multi-threaded, high-load environment, which is crucial for database systems.
RESULTS AND ANALYZE
In assessing the efficiency of various extendible hash table implementations within a DBMS context, our benchmarks focused on the performance under concurrent OLAP and OLTP workloads. We implemented three distinct versions: coarse-locked, fine-grained-locked, and lock-free. Utilizing Bustub, we gauged their performance, with particular attention to throughput, scalability, and data integrity amidst concurrent data modifications—a critical aspect for DBMS applications.
We set the testing on MacOS with M1Max chip. M1 Max has 8 performance cores and two efficiency cores. We are counting how many read and write the system can achieve with in 9 seconds of time. Our initial Key modifying range is 2048 (only modify keys from 0 to 2047).
The results were compelling, particularly for the lock-free implementation. Under OLAP conditions with four readers and four writers, the lock-free version presented a read throughput speedup of 25 times that of the fine-grained version and a remarkable 239 times the coarse-locked version. Write throughput also improved significantly, with the lock-free method achieving approximately 13 times the performance of the fine-grained and 28 times that of the coarse.
For OLTP workloads featuring eight readers to one writer, the lock-free approach showed a read throughput of 6742525.655 operations per second—a speedup of 20 times over the fine-grained and 79 times over the coarse. Write throughput observed a similar trend with a 10 times speedup over fine-grained and 12 times over coarse implementations.
OLAP Workload (7 reader 1 writer) Throughput Comparison:
The bar chart shows a dramatic difference in throughput between the lock-free implementation and the other two methods. The lock-free method vastly outperforms the coarse and fine-grained methods in both read and write throughput, with the bar lengths indicating the magnitude of this difference. The trend is clear: as the complexity of managing locks is removed, the throughput significantly increases, demonstrating the effectiveness of the lock-free approach in an OLAP setting with more readers.
OLTP Workload (4 reader 4 writer) Throughput Comparison:
This bar chart indicates that the lock-free implementation again has a far superior throughput for both read and write operations compared to the coarse and fine-grained locked versions. The balance between readers and writers in this OLTP workload shows that the lock-free method maintains its performance advantage even when the workload is evenly split between reading and writing operations.
Exploring varying reader and writer counts revealed a linear increase in read throughput with more readers, while write throughput showed a bottleneck as writer numbers grew, indicating the potential resizing operations as a limiting factor.
Read throughput and Write Throughput under 1 reader and different number of writers for lock-free hashtable:
The line graph shows that as the number of writers increases, the read throughput remains relatively stable, while the write throughput increases up to a point before plateauing. This trend suggests that while reads can be processed in parallel without much interference, writes are more prone to contention, especially when resizing is frequent. The “bending” or plateauing of the write throughput curve indicates a bottleneck, which is likely due to the resizing process. When a resize occurs, writers may need to wait for the resize to complete before they can proceed, hence the bend in the trend line.
Read throughput and Write Throughput under 1 writer and different number of readers for lock-free hashtable:
In contrast to the previous graph, this line graph illustrates the trend when there is a single writer and multiple readers. The read throughput scales almost linearly with the number of readers, showing that the lock-free hashtable can handle concurrent reads efficiently. The write throughput, represented by a relatively flat line, indicates that the presence of multiple readers does not significantly affect the ability to write to the hashtable, likely because the single writer is not experiencing contention from other writers.
Speedup Limitation Discussion
We benchmarked Read throughput and Write throughput for 7 writers and 1 reader under different number of key modifier range, trying to see if the problem is within the resizing itself or it is due to general contention.
Write Throughput (Red Line): The graph shows a slight downward trend in read throughput as the key modification range increases. However, as the range of keys that can be modified expands, the read operations may encounter less “stale” reads, or they have to navigate through a larger set of keys, potentially decreasing the chance of encountering a write operation. In this way, the main issue is due to locality of data.
Read Throughput (Blue Line): The write throughput remains relatively stable across the key modification range. This stability suggests that the writes are not significantly affected by the range of keys being modified, which could imply that the hash table is efficiently managing write operations even as the potential for write conflicts increases.
This indicates our algorithm and resizing is not the main problem. The reason for degradation is possibly due to memory evictions and Cache Coherence issues instead of our algorithm degradation.
The speedup in our lock-free extendible hash table implementation is primarily limited by the synchronization overhead that arises during the whole process.
Lack of Parallelism in this hashing algorithm
The read operations show a linear increase in throughput with the number of readers, which indicates that there’s no significant lack of parallelism for read operations. However, write operations do not scale as well, suggesting that write operations have more dependencies, likely due to the need to coordinate during the resize operation.
ommunication or Synchronization Overhead
The plateauing of write throughput with an increasing number of writers strongly indicates a synchronization bottleneck. As more writers attempt to write simultaneously, they are likely blocked by the need to coordinate access to shared data structures during resizing. This is evidenced by the fact that performance does not continue to scale with the addition of more writers.
Data Transfer
The Data Transfer will be a main issue, as the number of cores increase, the more contention there is for the cache and thus slowing down the process.
To benchmark the data transfer, we created two different workload models. The writer will be inserting and deleting and only inserting without bounding. It seems that inserting always has a worse performance.
In the write throughput category, the performance is higher when the writer threads engage in both insertions and deletions. This indicates a more efficient use of resources since deletions may free up space, allowing for more optimal memory utilization. In contrast, the insert-only scenario without bounds shows a reduced write throughput. This could be due to the system reaching a memory saturation point, where managing memory allocation becomes a significant overhead, negatively impacting performance.
The read throughput follows a similar pattern, albeit with a less pronounced difference between the two scenarios. The lower read throughput in the insert-only scenario suggests that the increased memory pressure and potential fragmentation from continuous insertions without deletions adversely affect the read performance as well.
This performance trend suggests that the memory overhead from unbounded insertions is a significant factor. The continuous growth of the hash table without corresponding deletions likely leads to increased memory fragmentation. This fragmentation can hinder the system’s ability to efficiently allocate space for new insertions, which in turn can slow down both read and write operations.
Conversely, the workload involving both insertions and deletions seems to manage memory more effectively. Deletions help by reclaiming memory, which can then be reused for new writes, leading to more consistent performance. Additionally, the fact that this scenario doesn’t lead to performance degradation—even with the overhead of bucket merging—implies that the system is more adept at handling dynamic workloads where objects are both added and removed.
Conclusion
This research demonstrated that lock-free extendible hash tables offer substantial performance improvements in concurrent environments typical of DBMS workloads. Our data substantiates that the lock-free extendible hash table markedly enhances concurrency handling, attributed to minimized system calls and the elimination of list movement operations, which are facilitated by a linked list. The diminishing write throughput with increased writers hints at a bottleneck likely due to resizing—a challenge that remains in even lock-free scenarios.
Future work should thus focus on optimizing the resizing process and memory usage to further unleash the potential of lock-free hashing.
REFERENCES
Ori Shalev and Nir Shavit. 2006. Split-ordered lists: Lock-free extensible hash tables. J. ACM 53, 3 (May 2006), 379–405. https://doi.org/10.1145/1147954.1147958
Vijay Kumar. 1990. Concurrent operations on extendible hashing and its performance. Commun. ACM 33, 6 (June 1990), 681–694. https://doi.org/10.1145/78973.78979
Carla Schlatter Ellis. 1987. Concurrency in linear hashing. ACM Trans. Database Syst. 12, 2 (June 1987), 195–217. https://doi.org/10.1145/22952.22954
Cmu-Db. “CMU-DB/Bustub: The BUSTUB Relational Database Management System (Educational).” GitHub, github.com/cmu-db/bustub. Accessed 15 Dec. 2023.
CMU 15-445/645 :: Intro to Database Systems (Fall 2023): https://15445.courses.cs.cmu.edu/fall2023/.
LIST OF WORK BY EACH STUDENT, AND DISTRIBUTION OF TOTAL CREDIT
Yicheng Zhang
Implementation of the lock version, benchmark, analyze, integration towards BusTub.
Ruijie Zhai
Implementation of the lock free version.