Fundamentals 16 min read

Hash Table Collisions and Resolution Strategies

The article explains how limited hash spaces cause collisions, then compares open hashing (separate chaining) and closed hashing techniques—linear probing, double hashing, and random hashing—detailing their insertion, search, and deletion algorithms along with the performance trade‑offs and load‑factor analysis for each method.

vivo Internet Technology

Apr 22, 2020

Hash Table Collisions and Resolution Strategies

Hash functions compress data to improve efficiency, but limited hash space and growing data cause hash collisions, which are a major challenge for effective compression. This article introduces hash collisions, various resolution methods, and discusses the advantages and disadvantages of each strategy.

In a hash table, the hash function maps a key to an index in the table. Although the set of possible keys can be huge, the set of hash values is limited to the table size. Hash functions are also used in password systems, message‑digest systems, and digital signatures, where a very low collision probability is required.

Common hash algorithms include:

MD5 : 2 128 possible values (approximately 2 64 trials to find a collision).

SHA‑1 : 2 160 possible values (approximately 2 80 trials to find a collision).

Hash Collisions

Consider a simple example with the hash function H(K) = K mod M (M = 7) and the keys 217, 701, 19, 30, 145.

H(K) = 217 % 7 = 0

H(K) = 701 % 7 = 1

H(K) = 19 % 7 = 2

H(K) = 30 % 7 = 2

H(K) = 145 % 7 = 5

The keys 19 and 30 collide at index 2.

Collision‑Resolution Strategies

Two broad families exist:

Open hashing (separate chaining): collided keys are stored in linked lists outside the main table.

Closed hashing (open addressing): collided keys are stored in alternative slots within the table.

Popular closed‑addressing techniques:

Linear probing

Double hashing

Random hashing

Popular open‑hashing technique:

Separate chaining

1. Linear Probing

Insertion algorithm:

Set indx = H(K).

If slot indx already contains the key, do nothing.

If slot indx is empty, insert the key.

Otherwise, set indx = (indx + 1) mod M and repeat.

If indx == H(K), the table is full and must be resized.

Advantages: easy to implement, always finds a slot if one exists, good average performance when the table is not heavily loaded.

Disadvantages: creates primary clustering, which degrades performance as the table becomes dense.

Example with M = 7 and keys 701, 145, 217, 19, 13, 749:

H(K) = 701 % 7 = 1

H(K) = 145 % 7 = 5

H(K) = 217 % 7 = 0

H(K) = 19 % 7 = 2

H(K) = 13 % 7 = 1 (collision) → 2 (occupied) → 3 (inserted)

H(K) = 749 % 7 = 2 (collision) → 3 (occupied) → 4 (inserted)

Retrieval algorithm (linear probing):

Set indx = H(K).

If slot indx contains the key, return FOUND.

If slot indx is empty, return NOT FOUND.

Otherwise, set indx = (indx + 1) mod M and repeat.

If indx == H(K), the key is not present; the table may need resizing.

Deletion is problematic because simply marking a slot empty can break the probe sequence for keys that collided later.

2. Double Hashing

Uses a second hash function H₂(K) to compute the probe offset, avoiding the clustering of linear probing.

For a table of size M (preferably prime), a common choice is H₂(K) = 1 + ((K / M) mod (M‑1)), which yields an offset in the range 1 … M‑1.

Insertion algorithm:

Set indx = H(K); offset = H₂(K).

If slot indx already contains the key, do nothing.

If slot indx is empty, insert the key.

Otherwise, set indx = (indx + offset) mod M and repeat.

If indx == H(K), the table is full.

Double hashing works well when the table size is prime.

Reference: https://blog.csdn.net/chenxuegui1234/article/details/103454285

3. Random Hashing

Similar to double hashing, but the probe sequence is generated by a pseudo‑random number generator seeded with the key.

Insertion algorithm:

Create an RNG seeded with K; set indx = RNG.next() mod M.

If slot indx already contains the key, do nothing.

If slot indx is empty, insert the key.

Otherwise, set indx = RNG.next() mod M and repeat.

If all M positions are probed without success, the table must be resized.

Random hashing is easy to analyze but incurs the overhead of RNG calls, so double hashing is more commonly used in practice.

4. Separate Chaining

When inserting a key K:

Set indx = H(K).

Insert the key into the linked list stored at bucket indx (first search the list to avoid duplicates).

Search and delete operations traverse the same linked list.

Advantages: average performance remains good as the number of entries grows (load factor α can exceed 1); deletion is straightforward.

Disadvantages: extra memory for pointers and poorer cache locality.

Java 7’s HashMap is an implementation of separate chaining.

Separate‑Chaining Analysis

Load factor α = N / M (N = number of stored keys, M = table size). For α > 1, separate chaining still offers O(1) average cost for successful and unsuccessful searches because the expected chain length is α.

Successful search cost (average number of comparisons):

Unsuccessful search cost (average number of comparisons):

When α < 1, these costs are less than 1 and 1.5 respectively; even for α > 1 they remain O(1) and independent of N.

Open Addressing (Closed Hashing) Analysis

Using the same load factor α, the average number of probes for an unsuccessful insertion/search with random hashing is 1/(1‑α). For linear probing, clustering increases the average probe count, approximated by the formula shown below:

For successful searches, random hashing yields an average probe count of

Linear probing’s average successful probe count is

When α approaches 1, both strategies degrade, but for α ≤ 7.75 the average costs remain low and essentially independent of M.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm hash table collision-resolution double hashing linear probing separate chaining

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.