Understanding Java HashMap hashCode and Hash Algorithm
This article explains the fundamentals of binary operations, why hashCode is used, how Java's String hashCode is implemented, the rationale behind using the multiplier 31, and the detailed workings of HashMap's hash function, index calculation, capacity choices, and custom sizing recommendations.
Abstract
Basic knowledge of binary computation.
Why hashCode is used.
The hashCode method of the String type.
Why most hashCode implementations use 31.
The implementation principle of HashMap's hash algorithm (why right‑shift 16 bits, why XOR).
Why HashMap uses bitwise AND instead of modulo.
Why HashMap's capacity is recommended to be a power of two.
What is the best custom HashMap capacity?
Preface
As an aspiring Java programmer, after long periods of CRUD and HTML filling, curiosity drives us to ask how the commonly used HashMap is actually implemented. Many experienced developers treat it as a must‑know interview topic, but do we truly understand its inner workings?
Specifically, we will explore the hashCode and hash algorithms that HashMap heavily relies on, even though some books claim this is a subject for mathematicians.
1. Basic Binary Computation
Since the article involves bitwise operations, a quick reminder of the meaning of common bitwise operators is provided to avoid confusion when reading the code.
In ordinary business projects, readability outweighs the performance gains of bitwise tricks, but in frameworks handling massive computations, such tricks become essential for performance.
2. Why Use hashCode
hashCode is primarily used by HashMap (and HashSet, which is built on HashMap) to generate array indices directly from a key's hash value, achieving near‑O(1) lookup time at the cost of extra memory.
3. String Type hashCode Method
In the JDK, Object's hashCode is a native method that returns the object's memory address. If a class does not override hashCode, HashMap will use this address, which can lead to unexpected behavior (e.g., a printed null because the object was never stored).
String overrides hashCode by iterating over its internal char array, multiplying the accumulated result by 31 and adding the current character value.
4. Why Most hashCode Implementations Use 31
Effective Java explains that 31 is an odd prime; using an even multiplier would lose information on overflow because multiplication by 2 is equivalent to a left shift. The prime 31 also allows the compiler to replace multiplication with a shift‑and‑subtract operation: 31 * i == (i << 5) - i , which is faster on modern VMs.
Other numbers like 63 or 15 could be used, but 31 offers a good balance of performance and low overflow risk.
5. HashMap's hash Algorithm (Why Right‑Shift 16 Bits and XOR)
HashMap applies a secondary hash function to spread the high bits of the original hashCode into the low bits, improving distribution across buckets.
The resulting expression is essentially h ^ (h >>> 16) , which mixes the high 16 bits into the low 16 bits.
When locating an entry, HashMap uses first = tab[(n - 1) & hash] , where n is the table length. The XOR and shift ensure that even when the original hashCode has poor low‑bit distribution, the final hash used for indexing is more uniform.
6. Why HashMap Uses Bitwise AND Instead of Modulo
The expression (n - 1) & hash works because n is always a power of two; thus n - 1 yields a mask of all 1s in the low bits. Bitwise AND is much faster than division/modulo on modern CPUs.
7. Why HashMap Capacity Should Be a Power of Two
If the table length is not a power of two (e.g., 10), the mask n - 1 will contain zeros, causing many different hash values to map to the same bucket, leading to long linked lists and degraded performance.
8. Recommended Custom Initial Capacity
To avoid frequent rehashing, the initial capacity should be larger than the expected number of entries multiplied by the load factor (default 0.75). A practical rule is to choose the next power of two greater than expectedSize / 0.75 . For example, for 2 expected entries, use capacity 4; for ~12 entries, use 16.
Conclusion
Understanding the design of hashCode and HashMap's hash algorithm helps developers choose appropriate initial capacities, diagnose performance issues caused by poor distribution, and write more efficient code.
Good luck!
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.