Why Java's String.hashCode Uses 31 as Multiplier: Theory and Experiments
This article explores the rationale behind Java's String.hashCode method using the multiplier 31, detailing its implementation, mathematical justification, performance optimizations, and experimental analysis of hash collision rates with various multipliers, concluding why 31 is an optimal choice.
When reading the source of String.hashCode() a curious constant 31 appears as the multiplication factor. The article investigates why this particular number was chosen and what impact it has on hash distribution and performance.
Implementation
The method is implemented as follows:
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char[] val = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}The core of the algorithm is the loop that repeatedly multiplies the current hash by 31 and adds the next character code.
Reason 1 – Prime Multiplier
31 is a small odd prime. Using a prime as a multiplier helps to spread hash values more uniformly, reducing the likelihood of collisions compared with even numbers or very small primes such as 2. Larger primes (e.g., 101) can cause integer overflow, which may discard useful information.
Reason 2 – JVM Optimization
Multiplication by 31 can be replaced by a shift‑and‑subtract operation: 31 * i == (i << 5) - i . Modern JVMs perform this optimization automatically, making the operation cheap while preserving the benefits of a prime multiplier.
Experimental Verification
The author wrote a benchmark that hashes over 230,000 English words from the Unix /usr/share/dict/words file using different multipliers (2, 3, 17, 31, 37, 41, 101, etc.). For each multiplier the program computes:
Minimum and maximum hash values
Total number of collisions
Collision rate (percentage)
Distribution of hash values across 64 equal partitions of the 32‑bit integer space
Key code fragments used in the experiment:
public static Integer hashCode(String str, Integer multiplier) {
int hash = 0;
for (int i = 0; i < str.length(); i++) {
hash = multiplier * hash + str.charAt(i);
}
return hash;
}
public static void calculateConflictRate(Integer multiplier, List
hashs) {
// compute min, max, distinct count, conflict number, conflict rate
// output formatted result
System.out.println(String.format("multiplier=%4d, minHash=%11d, maxHash=%10d, conflictNum=%6d, conflictRate=%.4f%%",
multiplier, minHash, maxHash, conflictNum, conflictRate * 100));
}Results
Multipliers 2 and 3 produce extremely high collision rates (>55%) and concentrate hash values in a single partition. Multipliers 17 and 31 achieve low collision rates (well below 1%) and a more even distribution across partitions. Larger primes such as 101 also give good distribution but risk overflow, which the author considers undesirable for a general‑purpose hash function.
Conclusion
The combination of being a small odd prime and allowing a cheap shift‑and‑subtract implementation makes 31 an excellent default multiplier for String.hashCode() . It balances good distribution, low collision probability, and efficient execution, which explains its long‑standing use in the Java standard library.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.