Fundamentals 11 min read

Why HashMap Uses a Load Factor of 0.75 and How It Handles Collisions

This article explains the purpose of HashMap's load factor, why the default value is 0.75, and describes various collision‑resolution techniques such as open addressing, linear and quadratic probing, rehashing, overflow areas, and chaining, supported by code examples and a Poisson‑distribution analysis.

Top Architect
Top Architect
Top Architect
Why HashMap Uses a Load Factor of 0.75 and How It Handles Collisions

Why does HashMap need a load factor?

HashMap is built on a hash table that stores key‑value pairs; the hash function determines the slot where each entry is placed. A load factor measures how full the table is, balancing space utilization against the probability of hash collisions.

Load factor = number of elements stored / length of the hash table

A larger load factor (e.g., 0.75) increases space efficiency but also raises the chance of collisions; a smaller factor reduces collisions but wastes space and triggers more frequent rehashing.

How are collisions resolved?

1. Open Addressing

Hi = (H(key) + di) MOD m, where i = 1,2,…,k (k ≤ m‑1)

Open addressing probes alternative slots using a step sequence di . It has three common variants:

1.1 Linear Probing

Step size di = 1,2,3,…,m‑1 ; the algorithm scans sequentially until an empty slot is found.

1.2 Quadratic Probing

Step size di = ±i² ; the probe distance grows quadratically, reducing primary clustering.

1.3 Pseudo‑Random Probing

Step size is taken from a pseudo‑random sequence, making the probe pattern unpredictable.

Open addressing suffers from clustering, complex deletion (requires tombstones), and may need an overflow area when the table is full.

2. Rehashing

Hi = RHi(key), where i = 1,2,…,k

When a collision occurs, a second hash function RHi computes a new slot, reducing clustering at the cost of extra computation.

3. Separate Overflow Area

All colliding entries are placed in an auxiliary overflow array, but look‑ups must scan this overflow area, which can be slow.

4. Chaining (Linked‑List Method)

Each bucket holds a linked list of entries that hash to the same slot. Insertion is simple, deletion is straightforward, and the structure adapts to unknown table size, though it requires extra memory for the links.

Why is the default load factor 0.75?

HashMap’s default capacity is 16. When the number of entries reaches 16 × 0.75 = 12 , the table is resized and all entries are rehashed, an expensive operation.

The choice of 0.75 is grounded in probability theory: under random hash codes, the distribution of bucket sizes follows a Poisson distribution with λ ≈ 0.5 when the load factor is 0.75. The expected frequencies are:

0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006

Thus, the probability of a bucket containing more than 8 entries is virtually zero, keeping lookup cost low while preserving reasonable space usage.

Choosing a higher factor (e.g., 0.8) would increase cache‑miss rates dramatically for open‑addressing schemes, while a lower factor (e.g., 0.6) would waste memory and cause more frequent resizing. Therefore 0.75 represents a practical trade‑off between time and space costs.

Reference

Ruanyifeng’s tutorial on Poisson and exponential distributions provides additional background.

JavaHashMapdata-structurescollision resolutionload factorPoisson distribution
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.