Fundamentals 20 min read

Understanding the Underlying Principles of Java HashMap

This article explains how Java's HashMap stores key‑value pairs using an array of nodes, how hash functions map keys to array indices, how collisions are handled with linked lists and red‑black trees, and why proper hashCode implementations are crucial for performance.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Understanding the Underlying Principles of Java HashMap

Preface

HashMap is a data structure used to store key‑value pairs and is extremely common in everyday development; this article explains its design principles.

Problem Description

HashMap is efficient for storing and retrieving key‑value pairs, but misuse can cause severe performance degradation. The following example inserts 10,000 entries and queries them, taking about 1400 ms:

public static void main(String[] args) {
    long start = System.currentTimeMillis();
    int n = 10000;
    Map
map = new HashMap<>();
    for (int i = 0; i < n; i++) {
        map.put(new Key(i), i);
    }
    for (int i = 0; i < n; i++) {
        Integer v = map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start); // average ~1400ms
}

@AllArgsConstructor
public static class Key {
    private Integer id; // id
    @Override
    public int hashCode() {
        return 1;
    }
    // equals omitted
}

A second example with a proper hashCode implementation runs in about 10 ms:

public static void main(String[] args) {
    long start = System.currentTimeMillis();
    int n = 10000;
    Map
map = new HashMap<>();
    for (int i = 0; i < n; i++) {
        map.put(new Key(i), i);
    }
    for (int i = 0; i < n; i++) {
        Integer v = map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start); // average ~10ms
}

@AllArgsConstructor
public static class Key {
    private Integer id; // id
    @Override
    public int hashCode() {
        return Objects.hash(id);
    }
    // equals omitted
}

The huge performance gap is caused solely by the different hashCode() implementations, which affect how keys are placed in the internal table.

HashMap Underlying Principle

HashMap uses an array combined with linked lists (or red‑black trees) to store entries. The internal structure includes a Node[] table array where each element is a Node containing hash , key , value , and a next pointer.

public class HashMap
{
    // ...
    Node
[] table; // array declaration
    // ...
}
class Node
{
    final int hash; // key's hash value
    final K key;   // key
    V value;       // value
    Node
next; // next node (linked list)
}

Each entry is stored in a bucket of the array; collisions are resolved by chaining nodes into a singly linked list.

Why Use an Array?

Arrays provide O(1) random access by index. By mapping a key to an array index via a hash function, HashMap can quickly locate the bucket where the entry resides.

HashMap's Hash Design Function

The hash function derives an int hash from the key’s hashCode() and mixes its high and low 16 bits:

int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This mixing improves distribution and reduces collisions.

Key Mapping to Array Index

After obtaining the hash, the index is computed with a bitwise AND:

int hash = hash(key);
int n = tab.length;
int i = (n - 1) & hash; // array index
Node node = table[i];

Because the table size is always a power of two, n‑1 has low bits set to 1, ensuring the index falls within the array bounds and distributes keys uniformly.

Hash Collisions

When multiple keys map to the same bucket, a collision occurs. HashMap resolves this by chaining the colliding entries in a linked list (or converting to a red‑black tree when the list becomes long).

Put Operation

The put process:

Compute the key’s hash.

Find the bucket index with (n‑1) & hash .

If the bucket is empty, insert a new node.

If not, traverse the list (or tree) to handle collisions or replace an existing value.

Collision handling uses hash equality, then == or equals() to compare keys.

// if hash matches and keys are equal
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) {
    // replace value
}

Resizing (Expansion)

When the number of entries exceeds loadFactor * table.length (default load factor 0.75), the table is resized to the next power of two, reducing collision probability. Existing entries are rehashed to new indices using the same (n‑1) & hash formula.

When to Resize?

The load factor is calculated as size / table.length . When it reaches 0.75, resizing occurs to keep performance balanced with memory usage.

Why Introduce Red‑Black Trees?

If a bucket’s linked list exceeds 7 nodes and the table size is at least 64, the list is transformed into a red‑black tree, providing O(log n) lookup instead of O(n). When the tree becomes small again, it may revert to a list.

Answer to the Initial Question

A poorly implemented hashCode() that returns a constant value forces all entries into the same bucket, causing massive collisions and severe performance degradation. A good hashCode implementation distributes entries uniformly across the table.

Conclusion

This article summarized the internal workings of Java's HashMap, covering its array‑based storage, hash function design, index calculation, collision handling, resizing strategy, and the conditions under which red‑black trees are used.

JavaperformanceData StructureHashMaphash functioncollision
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.