In this project, we address the question:
Can we design asymptotically and practically efficient data structures for sequences that can support a broad range of operations, including push/pop operations on both ends, concatenation, and split at a specified position?
In our ESA’14 paper 1, we present two new algorithms for which we use adaptations of the classic chunking technique for accelerating tree-based representations of sequences by representing sequences as trees of small, fixed-capacity arrays (i.e., chunks). We prove that our algorithms deliver tight amortized, worst-case bounds with small constant factors. We present two C++ implementations of our two algorithms and a number of experiments comparing our implementations to highly tuned and specialized sequence data structures, such as STL deque and STL rope. This work presents the first data structures for shared memory that simultaneously match the semantic and asymptotic profile of the Finger Tree of Hinze and Patterson 2, deliver strong guarantees with respect to constant-factor performance (as well as asymptotic performance), and compete well with highly tuned but more specialized data structures.
We have made available a long version of our ESA’14 article which includes appendices. In the appendix, there is additional discussion of experiments, extra experimental results, and proofs.
Our source code is hosted on a Github repository.
Documentation is available in HTML or PDF format.
We encourage interested parties to evaluate the findings of our empirical study.
The source code of our prototype implementation is hosted on a Github repository.
To have enough room to run the experiments, your filesystem should have about 30GB of free hard-drive space and your machine at least 128GB or RAM. These space requirements are so large because some of the input data we use are huge.
We use the nix package manager to handle the details of our experimental setup. The first step is to install nix on your machine.
First, we need to download the source code.
$ git clone https://github.com/deepsea-inria/chunkedseq.git
$ git clone https://github.com/deepsea-inria/sc15-pdfs.git
$ cd chunkedseq/script
If the machine does not already store a copy of the input data, we run the following command, which will just build the binaries for the experiments and the benchmark script.
$ nix-build benchmark.nix
If it succeeds, then the nix-build
command should leave
behind a new symlink named result
in the
script
folder. This results folder stores all the files
generated by nix build script.
The benchmarking binaries we are going to use are now reachable from
result/bench
. To save time typing, let us add this folder
to our $PATH
.
$ export PATH=`pwd`/result/bench/:$PATH
The next and final step is to run the benchmark script. This process involves downloading all the input data (unless it’s already present on the machine), running the benchmarks, and generating plots. To get started, we can issue the command to run all the experiments that were reported in the paper.
$ bench.pbench paper
It may be the case that the download of one of the input graphs fails. If so, then to recover it may be necessary to manually delete the graph and try again. If the process fails completely, please contact the team members.
To get statistical significance, you will need to take multiple samples of each data point, which you can execute the following.
$ bench.pbench paper -runs 29 -mode append
If you are interested to run only those experiments listed in the paper, then you may want to run the following command instead.
$ bench.pbench fifo,lifo,dfs,bfs,pbfs
It will take at least a few hours to complete the thirty of runs for
each data point. To get results faster, but with lower statistical
significance, reduce the number of runs. If, after getting the initial
results, you wish to add more runs, just run again, but this time, pass
the argument -mode append
.
When the experiment completes, the results table should appear as a
pdf file in a newly created _results
folder. The latex
source for the table is also generated and stored in the same
folder.
Get the bibtex file used to generate these references.