Milestone Report: Lock-free Extendible Hash Table in Database Management System

December 4, 2023

Milestone Report: Lock-free Extendible Hash Table in Database Management System

Yicheng Zhang, Ruijie Zhai

Our ultimate plan is implementation of a Lock-Free Extendible Hash Table and do benchmarking analysis.

We want to develop a fully functional lock-free extendible hash table in C++, based on the principles outlined in the Shalev and Shavit paper (https://doi.org/10.1145/1147954.1147958).

Project Schedule and Progress:

Dec 4 - 6: Implementation III

Task: Finish debugging the lock-free version. Finalize the setup of the benchmark system and integrate ETH into the BusTub database system.

Responsibilities:

Ruijie: Focuses primarily on BusTub integration, setting up benchmarks.

Yicheng: Handles the hash table components.

Dec 7-10: Benchmark

Task: Execute benchmarks, log statistics, draw graphs, and generalize results. Reference previous assignments for benchmarking methods. Control the number of works and CPU resources to compare lock-free code performance.

Responsibilities:

Ruijie: Develops and manages the benchmark system.

Yicheng: Conducts data analysis.

Dec 11-14: Poster and Paper Writing

Task: Utilize the raw data to create a poster and write the final paper.

Responsibilities:

Both Ruijie and Yicheng: Collaboratively work on the report.

Work Completed Summary:

Accomplishments:

Overall, we’ve completed mostly the coding for the extendible hash table data structure, including a coarse-grained lock version, a fine-grained lock version, and we’ve finished the raw implementation of the lock-free version, but we’re still working on debugging it.

For each version, we’ve established the basic ETH logic with layered table structure involving headers, directories, and buckets.
Implemented locking mechanisms at different structural layers and integrated logic for extending and shrinking.
Developed a raw lock-free version based on algorithms from a specific paper, using linked lists and “split orders” for concurrent editing without locks. This is not fully done yet as we’re experiencing bugs.

The team is now in the process of fixing the lock-free version, and then ensuring that the hash function’s performance is optimized within a microbench. The completion of this step is crucial before moving on to the benchmarking phase.

Progress Towards Goals:

We think that we are on the right track and will depend on the benchmark results to see what are the necessary “nice to haves”.

Based on the current progress, it appears feasible to produce all the initially stated deliverables. The major coding components are mostly completed, though debugging might go sideways but we think we still have enough time for some unexpected things. The project will soon be shifting towards performance testing and final report preparation.

After the bug fixing stage, the primary challenge will lie in the integration of ETH into BusTub and the subsequent performance testing. Ensuring accurate and efficient benchmarking might require additional effort.

Poster Session Preview:

Data structure design showcase:

We’ll present the design of split-ordered lists as the underlying lock-free extendible hash table implementation. There will be graphs, and a few words on explaining the data structure.

Interactive Demo:

Present an interactive demonstration of the hash table integrated within the Bustub environment. We will use a laptop to do this. We also need to show the hash table handling various types of loads and transactions in real-time.

Performance Metrics Display:

Exhibit graphs and charts showcasing the performance benchmarks, comparing our implementation with the baseline. Explain why they are fast by explaining the algorithm.

Preliminary Results:

We haven’t got to run our implementations on benchmarks, we are still wrapping up the developing stage, so we don’t have any substantial preliminary results to share at this point.

Concerns and Unknowns:

We’re still a bit concerned about the lock-free coding, as it has not been completely done yet.

There is also some concern about the benchmark results. We are not sure whether it will finally be fast enough. For other concerns, we’ve discussed with Professor Skarlatos on Friday, which primarily consists of the concerns on the number of benchmarking scenarios.