Extendbile Hashing
In your intro or intermediate data structures course, you probably discussed hashing -- chaining and open addressing. With chaining, collisions were resolved using a linked list (or other dynamic bucket). With open addressing collisions were involved by probing alternative buckets to find an empy space (remember double hashing, linear probing, quadratic probing). For a quick review, check ut this this review.Chaining is a good solution in main memory, where random access to memory is cheap, but in appropriate on disk where random access is cost-prohibitive. On disk the open addressing techniques, especially linear probing, are prefered -- just by waiitng, the disk spins by over subsequent buckets.
In each of these two addresses, the address space of the hash table is fixed. In order to create a dynamic database that functions at all scales, the address space needs to grow in response to collision. We'll discuss two techniques that accomplish this:
- Extendible Hashing
- Linear Hashing
Extendible hashing is based on a radix-2 trie. The idea is to hash the key, yielding a long two-bit number. Then, use as many bits as desired to create a radix-2 trie. But, instead of building a tree, just collapse the tree, interpreting the 0/1 sequence along each branch as a binary number, used as the index into the bucket array. When collision occurs, more bits can be used to divide the buckets into a larger (by powers of 2) address space.
Linear Hashing works similarly. It begins by hashing to generrate the binary number, and using a certain number of digits as the index into an array of buckets. But, when collision occurs, it is not resolved. At least, it is not resolved by growth. Instead the table grows linearly, adding a single additional bucket, to try to prevent future collision. We read the digits of the hash from right-to-left, allowing the buckets to "split", separating the items by the newly realized left digit. Collison still needs to be managed by probing, a collision file, &c.
The following illustrations should be helpful:
Extendible Hashing:
Extendible Hashing: