Cracking the One Billion Row Challenge: Java Performance Secrets Unveiled
This article walks through the One Billion Row Challenge, explaining the problem, baseline solution, and a series of progressive Java optimizations—from parallel streams and better JVMs to custom hash tables, Unsafe, SWAR, and statistical tuning—that shrink execution time from minutes to under two seconds.
Challenge Overview
On January 1, 2024 Gunnar Morling announced the One Billion Row Challenge (1BRC). The task is to read a massive file where each line contains a weather‑station name and a temperature (semicolon‑separated, one decimal place), then compute the minimum, maximum and average temperature for each station and output the results in dictionary order.
https://www.morling.dev/blog/one-billion-row-challenge/
Example input:
chengdu;12.0 guangzhou;7.2; chengdu;6.3 beijing;-3.6; chengdu;23.0 shanghai;9.8; chengdu;24.3 beijing;17.8;
For the station chengdu the minimum is 6.3, the maximum is 24.3, and the average is (12.0+6.3+23.0+24.3)/4 = 16.4.
Baseline Solution
The official baseline code aggregates measurements with a MeasurementAggregator object that stores min, max, sum and count. It uses Java streams to read the file line‑by‑line, split each line on the semicolon, parse the temperature, and group by station using a TreeMap. On a high‑end server the baseline finishes in under two minutes; on a modest machine it takes about 14 minutes.
First Place Code
The winning solution (Thomas Wü) is highly optimized and difficult to read. It relies on custom low‑level parsing, aggressive inlining, and extensive use of Unsafe and SIMD techniques.
Optimization Journey
Version 0 – Better JVM
Switching from the default JVM to GraalVM reduces runtime from 71 s to 66 s, mainly by eliminating JVM start‑up overhead.
Version 1 – Parallel I/O
Reading the file in parallel chunks (using memory‑mapped I/O) replaces the line‑by‑line BufferedReader. This cuts the runtime to 17 s.
Version 2 – Faster Temperature Parsing
Instead of converting the temperature string to a double and then to an integer, a custom parser reads the digits directly into an int, dropping the runtime to 11 s.
private int parseTemperature(long semicolonPos) {
long off = semicolonPos + 1;
int sign = 1;
byte b = chunk.get(JAVA_BYTE, off++);
if (b == '-') {
sign = -1;
b = chunk.get(JAVA_BYTE, off++);
}
int temp = b - '0';
b = chunk.get(JAVA_BYTE, off++);
if (b != '.') {
temp = 10 * temp + b - '0';
off++; // skip '.'
}
b = chunk.get(JAVA_BYTE, off);
temp = 10 * temp + b - '0';
return sign * temp;
}Version 3 – Custom Hash Table
Because only about 400 stations exist, a handcrafted open‑addressing hash table replaces HashMap, reducing runtime to 6.6 s.
Version 4 – Unsafe & SWAR
Using sun.misc.Unsafe for unchecked memory access and SIMD‑within‑a‑register (SWAR) techniques to locate semicolons and parse temperatures pushes the runtime down to 2.4 s.
Version 5 – Statistical Tuning
Analyzing the distribution of station‑name lengths shows roughly half are ≤ 8 bytes. Adjusting branch predictions and further micro‑optimizations brings the final runtime to 1.8 s.
Further Tweaks
Eliminating start‑up/tear‑down costs and using smaller file chunks with work‑stealing shave another 0.1 s, yielding a best‑in‑class time of 1.7 s.
Takeaways
The challenge demonstrates how low‑level I/O, custom data structures, branch‑prediction awareness, and JVM tuning can turn a minute‑scale Java program into a sub‑two‑second solution. While the code becomes harder to read, the techniques are valuable for high‑performance Java back‑end development.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
