Fundamentals 12 min read

Why Does Java’s HashMap Use a 0.75 Load Factor? The Math Behind It

This article explains why HashMap relies on a 0.75 load factor, how load factor balances space utilization and collision probability, the various collision‑resolution strategies, and the statistical reasoning—particularly Poisson distribution—that led to choosing 0.75 over other values.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Why Does Java’s HashMap Use a 0.75 Load Factor? The Math Behind It

Many details are often ignored when learning HashMap; revisiting the load factor reveals deep mathematical reasoning.

The article covers:

Why HashMap needs a load factor

Methods for resolving collisions

Why the default load factor is 0.75 instead of 0.8 or 0.6

Why does HashMap need a load factor?

HashMap is built on a hash table that maps keys to positions using a hash function:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// AbstractMap
public int hashCode() {
    int h = 0;
    Iterator<Entry<K,V>> i = entrySet().iterator();
    while (i.hasNext())
        h += i.next().hashCode();
    return h;
}

HashMap favors fast look‑up over fast insertion, which leads to two opposing problems: high space utilization causes many collisions, while low utilization wastes memory. The load factor measures how full the table is.

Load factor = number of stored elements / table length

A larger load factor improves space efficiency but raises collision chances; a smaller factor reduces collisions but increases wasted space and rehashing frequency. An optimal balance is required.

How to resolve collisions?

1. Open addressing

Hi = (H(key) + di) MOD m, where i = 1,2,…,k (k ≤ m‑1)

Open addressing includes:

1.1 Linear probing (di = 1,2,3,…,m‑1)

Search sequentially from the conflicted slot with step 1 until an empty slot is found.

1.2 Quadratic probing (di = ±i²)

Search using a step that grows quadratically, reducing clustering.

1.3 Pseudo‑random probing (di = pseudo‑random sequence)

Use a random step size for each probe.

Drawbacks of open addressing include clustering, difficulty deleting entries, and the need for an overflow area when the table is full.

2. Rehashing

Hi = RHi(key), where i = 1,2,…,k

A secondary hash function computes a new position when a collision occurs, increasing computation time.

3. Separate overflow area

An auxiliary overflow table stores all colliding entries, but lookup requires scanning this overflow.

4. Chaining (linked‑list buckets)

Each bucket holds a linked list of entries that hash to the same index.

Simple to implement and avoids clustering

Dynamic node allocation fits unknown table sizes

Easy deletion by removing list nodes

The downside is extra memory for the linked lists.

HashMap’s actual implementation combines an array with linked lists (and red‑black trees for large buckets), effectively using both open addressing and chaining techniques.

Why is the default load factor 0.75?

HashMap’s default initial capacity is 16. When the number of entries reaches DEFAULT_INITIAL_CAPACITY × DEFAULT_LOAD_FACTOR (16 × 0.75 = 12), the table is resized, a costly operation.

The choice of 0.75 stems from the Poisson distribution, which models the expected number of entries per bucket under random hash codes.

Threshold = DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR

In the JDK source, a comment explains that with a load factor of 0.75 the expected bucket size follows a Poisson distribution with λ≈0.5, yielding very low probabilities for long chains:

0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006

Thus, a bucket length of 8 occurs with probability 6 × 10⁻⁸, effectively negligible.

Why not 0.8 or 0.6?

Higher load factors (>0.8) dramatically increase cache misses and lookup cost, while lower factors (<0.6) waste memory and trigger more frequent rehashes. Empirical studies and the Poisson analysis show that 0.75 offers a balanced trade‑off between time and space.

References

Poisson distribution tutorial: http://www.ruanyifeng.com/blog/2015/06/poisson-distribution.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaHashMapcollision-resolutionload factor
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.