Why Does Java’s HashMap Use a 0.75 Load Factor? The Math Behind It
This article explains why HashMap relies on a 0.75 load factor, how load factor balances space utilization and collision probability, the various collision‑resolution strategies, and the statistical reasoning—particularly Poisson distribution—that led to choosing 0.75 over other values.
Many details are often ignored when learning HashMap; revisiting the load factor reveals deep mathematical reasoning.
The article covers:
Why HashMap needs a load factor
Methods for resolving collisions
Why the default load factor is 0.75 instead of 0.8 or 0.6
Why does HashMap need a load factor?
HashMap is built on a hash table that maps keys to positions using a hash function:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// AbstractMap
public int hashCode() {
int h = 0;
Iterator<Entry<K,V>> i = entrySet().iterator();
while (i.hasNext())
h += i.next().hashCode();
return h;
}HashMap favors fast look‑up over fast insertion, which leads to two opposing problems: high space utilization causes many collisions, while low utilization wastes memory. The load factor measures how full the table is.
Load factor = number of stored elements / table length
A larger load factor improves space efficiency but raises collision chances; a smaller factor reduces collisions but increases wasted space and rehashing frequency. An optimal balance is required.
How to resolve collisions?
1. Open addressing
Hi = (H(key) + di) MOD m, where i = 1,2,…,k (k ≤ m‑1)Open addressing includes:
1.1 Linear probing (di = 1,2,3,…,m‑1)
Search sequentially from the conflicted slot with step 1 until an empty slot is found.
1.2 Quadratic probing (di = ±i²)
Search using a step that grows quadratically, reducing clustering.
1.3 Pseudo‑random probing (di = pseudo‑random sequence)
Use a random step size for each probe.
Drawbacks of open addressing include clustering, difficulty deleting entries, and the need for an overflow area when the table is full.
2. Rehashing
Hi = RHi(key), where i = 1,2,…,kA secondary hash function computes a new position when a collision occurs, increasing computation time.
3. Separate overflow area
An auxiliary overflow table stores all colliding entries, but lookup requires scanning this overflow.
4. Chaining (linked‑list buckets)
Each bucket holds a linked list of entries that hash to the same index.
Simple to implement and avoids clustering
Dynamic node allocation fits unknown table sizes
Easy deletion by removing list nodes
The downside is extra memory for the linked lists.
HashMap’s actual implementation combines an array with linked lists (and red‑black trees for large buckets), effectively using both open addressing and chaining techniques.
Why is the default load factor 0.75?
HashMap’s default initial capacity is 16. When the number of entries reaches DEFAULT_INITIAL_CAPACITY × DEFAULT_LOAD_FACTOR (16 × 0.75 = 12), the table is resized, a costly operation.
The choice of 0.75 stems from the Poisson distribution, which models the expected number of entries per bucket under random hash codes.
Threshold = DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR
In the JDK source, a comment explains that with a load factor of 0.75 the expected bucket size follows a Poisson distribution with λ≈0.5, yielding very low probabilities for long chains:
0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006Thus, a bucket length of 8 occurs with probability 6 × 10⁻⁸, effectively negligible.
Why not 0.8 or 0.6?
Higher load factors (>0.8) dramatically increase cache misses and lookup cost, while lower factors (<0.6) waste memory and trigger more frequent rehashes. Empirical studies and the Poisson analysis show that 0.75 offers a balanced trade‑off between time and space.
References
Poisson distribution tutorial: http://www.ruanyifeng.com/blog/2015/06/poisson-distribution.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
