Why Does Java’s String.hashCode Use 31? Uncovering the Math and Performance Secrets
This article explains why Java’s String.hashCode method multiplies by 31, covering the mathematical derivation, prime‑number benefits, JVM optimization tricks, and experimental results that compare 31 with other multipliers on a large word list.
While exploring the source of String.hashCode(), I noticed the mysterious constant 31 used as a multiplier and set out to understand its purpose.
How String.hashCode() Works
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}The core computation is the loop h = 31 * h + val[i], which can be expressed as the polynomial s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1], where s is the internal char array.
Why 31?
Two main reasons are commonly cited:
31 is an odd prime of moderate size, making it a good hash multiplier that reduces collisions without causing overflow as easily as larger primes.
The JVM can optimize the multiplication: 31 * i == (i << 5) - i, allowing the operation to be performed with a shift and a subtraction.
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i . Modern VMs do this automatically.
Experiments were conducted on over 230,000 English words from /usr/share/dict/words, testing multipliers such as 2, 3, 17, 31, 37, 41, 101, and others. The conflict rate was measured by counting duplicate hash values.
Experimental Results
Using small primes like 2 produced a conflict rate of 55.14% and most hash values fell into a single partition of the integer range. Multipliers 31, 37, and 41 yielded very low conflict counts (under 7). Larger primes like 101 gave even better distribution but risk integer overflow.
Visualization of hash‑value distribution (partitioned into 64 equal ranges) showed that 2 and 3 concentrate almost entirely in one partition, while 31 spreads more evenly across many partitions, confirming its suitability.
In summary, 31 is a well‑chosen multiplier for String.hashCode() because it balances prime‑based collision reduction with JVM‑friendly arithmetic, making it an effective default for Java’s hash algorithm.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
