Understanding Why HashMap Uses a Load Factor of 0.75 and How It Resolves Collisions
This article explains the purpose of HashMap's load factor, the trade‑offs between space utilization and collision probability, describes common collision‑resolution techniques such as open addressing, rehashing and chaining, and shows why the default load factor of 0.75 is chosen based on Poisson‑distribution analysis.
HashMap is a hash‑table based key‑value store; its load factor indicates how full the table may become before a resize is triggered.
Why does HashMap need a load factor?
The underlying structure stores entries in an array; if the array is too full, many hash positions are already occupied, causing hash collisions, while a very low load factor wastes space and increases the number of costly rehash operations.
Load factor = number of stored elements / length of the hash table
A larger load factor improves space utilization but raises collision probability; a smaller factor reduces collisions but leads to more frequent resizing and wasted memory.
How to resolve collisions?
1. Open addressing
Hi = (H(key) + d_i) MOD m, where i = 1,2,…,k (k ≤ m‑1)Open addressing probes the table using a step sequence d_i . Three common probing strategies are:
1.1 Linear probing (d_i = 1,2,3,…)
Searches sequentially from the collided slot with a step of 1 until an empty slot is found.
1.2 Quadratic probing (d_i = ±1², ±2², …)
Uses a quadratic step size, reducing clustering compared with linear probing.
1.3 Pseudo‑random probing (d_i = pseudo‑random sequence)
Chooses step sizes randomly, further dispersing probes.
Open addressing suffers from clustering, complex deletion, and may require an overflow area when the table is full.
2. Rehashing
H_i(key) = RHi(key), where i = 1,2,…,kWhen a collision occurs, a secondary hash function RHi computes a new address until an empty slot is found. This reduces clustering but adds extra computation.
3. Separate overflow area
All colliding entries are placed in a dedicated overflow table, but look‑ups must scan this area, which can be inefficient.
4. Chaining (linked‑list method)
Each bucket holds a linked list of entries that hash to the same index. Deletion is simple, and the structure adapts to unknown table size, but it requires extra pointer storage.
Java's HashMap combines an array with linked lists (and, for large bins, red‑black trees) to implement a hybrid of open addressing and chaining.
Why is the default load factor 0.75?
HashMap’s default initial capacity is 16. When the number of entries reaches DEFAULT_INITIAL_CAPACITY × DEFAULT_LOAD_FACTOR (16 × 0.75 = 12), the table is resized, which is an expensive operation.
The choice of 0.75 stems from statistical analysis using the Poisson distribution. The JDK source contains a comment explaining that, under random hash codes, the number of nodes per bucket follows a Poisson distribution with a mean parameter of about 0.5, and a threshold of 0.75 yields an acceptable probability of long chains.
/* Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million */These probabilities show that a chain longer than eight elements is extremely unlikely ( 6 × 10⁻⁸ ), making 0.75 a practical compromise between time spent resizing and memory overhead.
In summary, the default load factor of 0.75 balances space efficiency and lookup performance, guided by probabilistic analysis of hash‑bucket occupancy.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.