How to Crush the One Billion Row Challenge: Java Performance Secrets Revealed
This article walks through the One Billion Row Challenge—parsing a 13 GB file of 1 billion temperature records—by examining the baseline Java solution, analyzing top contestants' results, and detailing a step‑by‑step series of low‑level optimizations (JVM choice, parallel I/O, custom parsing, bespoke hash tables, Unsafe and SWAR techniques) that shrink execution time from minutes to under two seconds.
Challenge Overview
The One Billion Row Challenge requires reading a 13 GB text file where each line contains a weather‑station name and a temperature (one decimal place). For each station the program must compute the minimum, maximum and average temperature and output the results in lexical order.
Reference Baseline
The baseline solution (https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_baseline.java) defines a MeasurementAggregator that stores min, max, sum and count per station. It reads the file line‑by‑line with a BufferedReader, splits each line on the semicolon, updates the aggregator and finally writes the results from a TreeMap. On the official benchmark hardware the program finishes in under two minutes; on a modest PC it takes about fourteen minutes.
Optimization Path
0 – Switch JVM to GraalVM
Running the same baseline code on GraalVM reduces the runtime from 71 s to 66 s by eliminating JVM start‑up overhead.
1 – Parallel I/O
Using parallel Java streams and chunking the file so that each thread processes a distinct region (implemented with MemorySegment in JDK 21) brings the runtime to 71 s on a 32‑core server.
2 – Hand‑crafted Temperature Parser
Replacing Double.parseDouble with a custom integer parser removes string allocation and floating‑point work. The core method is:
private int parseTemperature(long semicolonPos) {
long off = semicolonPos + 1;
int sign = 1;
byte b = chunk.get(JAVA_BYTE, off++);
if (b == '-') {
sign = -1;
b = chunk.get(JAVA_BYTE, off++);
}
int temp = b - '0';
b = chunk.get(JAVA_BYTE, off);
if (b != '.') {
temp = 10 * temp + b - '0';
off++; // skip the dot
}
b = chunk.get(JAVA_BYTE, off);
temp = 10 * temp + b - '0';
return sign * temp;
}This change drops the runtime to about 11 s.
3 – Custom Open‑Addressing Hash Table
Because the dataset contains only 413 distinct stations, the HashMap is replaced with a bespoke open‑addressing hash table (https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog3.java). The new StatsAcc structure eliminates the overhead of computeIfAbsent and reduces runtime to 6.6 s.
4 – Unsafe and SWAR Techniques
Further gains are achieved by using sun.misc.Unsafe to read memory without bounds checks and applying SWAR (SIMD‑within‑a‑register) to locate semicolons and parse temperatures in 8‑byte chunks. The implementation (https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog4.java) brings the time down to 2.4 s.
5 – Branch‑Prediction Tuning
Statistical analysis shows that roughly half of the station names are ≤ 8 bytes. By changing the branch‑prediction condition from nameLen > 8 to nameLen > 16, mispredicted branches are dramatically reduced. The final version (https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog5.java) achieves 1.7 s.
Key Takeaways
Collecting detailed runtime metrics guides optimisation decisions. The biggest speedups come from custom integer parsing and a specialised hash table; low‑level tricks such as Unsafe and SWAR shave the last seconds but heavily reduce readability and maintainability.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
