Understanding the Underlying Principles of Java HashMap
This article explains how Java's HashMap stores key‑value pairs using an array of nodes, how hash functions map keys to array indices, how collisions are handled with linked lists and red‑black trees, and why proper hashCode implementations are crucial for performance.
Preface
HashMap is a data structure used to store key‑value pairs and is extremely common in everyday development; this article explains its design principles.
Problem Description
HashMap is efficient for storing and retrieving key‑value pairs, but misuse can cause severe performance degradation. The following example inserts 10,000 entries and queries them, taking about 1400 ms:
public static void main(String[] args) {
long start = System.currentTimeMillis();
int n = 10000;
Map
map = new HashMap<>();
for (int i = 0; i < n; i++) {
map.put(new Key(i), i);
}
for (int i = 0; i < n; i++) {
Integer v = map.get(i);
}
System.out.println(System.currentTimeMillis() - start); // average ~1400ms
}
@AllArgsConstructor
public static class Key {
private Integer id; // id
@Override
public int hashCode() {
return 1;
}
// equals omitted
}A second example with a proper hashCode implementation runs in about 10 ms:
public static void main(String[] args) {
long start = System.currentTimeMillis();
int n = 10000;
Map
map = new HashMap<>();
for (int i = 0; i < n; i++) {
map.put(new Key(i), i);
}
for (int i = 0; i < n; i++) {
Integer v = map.get(i);
}
System.out.println(System.currentTimeMillis() - start); // average ~10ms
}
@AllArgsConstructor
public static class Key {
private Integer id; // id
@Override
public int hashCode() {
return Objects.hash(id);
}
// equals omitted
}The huge performance gap is caused solely by the different hashCode() implementations, which affect how keys are placed in the internal table.
HashMap Underlying Principle
HashMap uses an array combined with linked lists (or red‑black trees) to store entries. The internal structure includes a Node[] table array where each element is a Node containing hash , key , value , and a next pointer.
public class HashMap
{
// ...
Node
[] table; // array declaration
// ...
}
class Node
{
final int hash; // key's hash value
final K key; // key
V value; // value
Node
next; // next node (linked list)
}Each entry is stored in a bucket of the array; collisions are resolved by chaining nodes into a singly linked list.
Why Use an Array?
Arrays provide O(1) random access by index. By mapping a key to an array index via a hash function, HashMap can quickly locate the bucket where the entry resides.
HashMap's Hash Design Function
The hash function derives an int hash from the key’s hashCode() and mixes its high and low 16 bits:
int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}This mixing improves distribution and reduces collisions.
Key Mapping to Array Index
After obtaining the hash, the index is computed with a bitwise AND:
int hash = hash(key);
int n = tab.length;
int i = (n - 1) & hash; // array index
Node node = table[i];Because the table size is always a power of two, n‑1 has low bits set to 1, ensuring the index falls within the array bounds and distributes keys uniformly.
Hash Collisions
When multiple keys map to the same bucket, a collision occurs. HashMap resolves this by chaining the colliding entries in a linked list (or converting to a red‑black tree when the list becomes long).
Put Operation
The put process:
Compute the key’s hash.
Find the bucket index with (n‑1) & hash .
If the bucket is empty, insert a new node.
If not, traverse the list (or tree) to handle collisions or replace an existing value.
Collision handling uses hash equality, then == or equals() to compare keys.
// if hash matches and keys are equal
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) {
// replace value
}Resizing (Expansion)
When the number of entries exceeds loadFactor * table.length (default load factor 0.75), the table is resized to the next power of two, reducing collision probability. Existing entries are rehashed to new indices using the same (n‑1) & hash formula.
When to Resize?
The load factor is calculated as size / table.length . When it reaches 0.75, resizing occurs to keep performance balanced with memory usage.
Why Introduce Red‑Black Trees?
If a bucket’s linked list exceeds 7 nodes and the table size is at least 64, the list is transformed into a red‑black tree, providing O(log n) lookup instead of O(n). When the tree becomes small again, it may revert to a list.
Answer to the Initial Question
A poorly implemented hashCode() that returns a constant value forces all entries into the same bucket, causing massive collisions and severe performance degradation. A good hashCode implementation distributes entries uniformly across the table.
Conclusion
This article summarized the internal workings of Java's HashMap, covering its array‑based storage, hash function design, index calculation, collision handling, resizing strategy, and the conditions under which red‑black trees are used.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.