According to various sources, such as Wikipedia and various .edu websites found by Google, the most common ways for a hash table to resolve collisions are linear or quadratic probing and chaining. Randomized probing is briefly mentioned but not given much attention. I ve implemented a hash table that uses randomized probing to resolve collisions. Assuming there is a collision, resolution works as follows:
- The full (32-bit) hash of an object is used to seed a linear congruential random number generator.
- The generator generates 32-bit numbers and the modulus is taken to determine where in the hash table to probe next.
This has the very nice property that, regardless of how many hash collisions there are in modulus space, lookup and insertion times are expected to be O(1) as long as there are few collisions in full 32-bit hash space. Because the probe sequence is pseudo-random, no clustering behavior results from modulus space collisions, unlike with linear probing. Because the entire system is open-addressed and doesn t use linked lists anywhere, you don t need to perform a memory allocation on each insertion, unlike chaining.
Furthermore, because the size of the hash is usually the size of the address space (32 bits on 32-bit machines), it is simply impossible to fit enough items in address space to cause large numbers of hash collisions in full 32-bit hash space under a good hashing scheme.
Why, then, is randomized probing such an unpopular collision resolution strategy?