How to Crush the One Billion Row Challenge: Java Performance Secrets Revealed
This article walks through the One Billion Row Challenge, explaining the problem, baseline Java solution, and a series of deep performance optimizations—from parallel streams and custom hash tables to unsafe memory access and SIMD techniques—that shrink execution time from minutes to under two seconds.
Hello, I'm Su San.
Recently I discovered a Java‑centric competition called the One Billion Row Challenge (1BRC) and decided to explore the submissions, which sparked the question: "Is the Java they write the same as the Java I write?"
Challenge Overview
On January 1, 2024, Gunnar Morling announced the challenge: given a 13 GB file with 1 billion lines, each line containing a weather station name and a temperature (e.g., chengdu;12.0), compute the minimum, maximum, and average temperature per station and output the results in lexical order.
Sample input:
chengdu;12.0 guangzhou;7.2; chengdu;6.3 beijing;-3.6; chengdu;23.0 shanghai;9.8; chengdu;24.3 beijing;17.8;
For the sample, Chengdu's min is 6.3, max is 24.3, and average is 16.4.
The difficulty lies in the file size: 1 billion rows (~13 GB), which makes naïve line‑by‑line processing impractically slow.
The full problem description and data are hosted on GitHub: https://github.com/gunnarmorling/1brc .
Baseline Implementation
The official baseline code parses each line with a BufferedReader, splits on the semicolon, updates a MeasurementAggregator (holding min, max, sum, count), and finally prints a TreeMap for lexical ordering. On a high‑end server it finishes in under 2 minutes; on a modest laptop it takes about 14 minutes.
private int parseTemperature(long semicolonPos) {
long off = semicolonPos + 1;
int sign = 1;
byte b = chunk.get(JAVA_BYTE, off++);
if (b == '-') {
sign = -1;
b = chunk.get(JAVA_BYTE, off++);
}
int temp = b - '0';
b = chunk.get(JAVA_BYTE, off++);
if (b != '.') {
temp = 10 * temp + b - '0';
off++; // skip '.'
}
b = chunk.get(JAVA_BYTE, off);
temp = 10 * temp + b - '0';
return sign * temp;
}Optimization Journey
Version 0 – Switch JVM
Running the same code on GraalVM instead of OpenJDK shaved ~5 seconds (71 s → 66 s) by eliminating JVM start‑up overhead.
Version 1 – Parallel I/O
Using parallel Java streams distributes work across all CPU cores, reducing runtime to ~71 seconds on a 32‑core server.
Version 2 – Faster Temperature Parsing
Replacing Double.parseDouble with a handcrafted integer parser cut the temperature‑parsing cost from 21 % to 6 % of total time, bringing runtime down to ~11 seconds.
Version 3 – Custom Hash Table
Because only ~413 distinct stations exist, a hand‑rolled open‑addressing hash table replaces HashMap, eliminating allocation overhead and reducing runtime to ~6.6 seconds.
Version 4 – Unsafe & SWAR
Switching to sun.misc.Unsafe for raw memory access, processing eight bytes at a time (SWAR), and using a highly tuned SIMD temperature parser pushed execution to ~2.4 seconds.
Version 5 – Statistical Tweaks
Analyzing station‑name length distribution showed ~50 % of names are ≤8 bytes, prompting a branch‑prediction‑friendly rewrite that avoided costly if checks, further dropping runtime to ~1.8 seconds.
Final Tweaks
Eliminating start‑up/cleanup costs and fine‑tuning chunk sizes with work‑stealing pushed the best result to ~1.7 seconds.
All source code referenced in the article is available on GitHub:
Baseline
Version 1
Version 2
Version 3
Version 4
Version 5
The step‑by‑step breakdown demonstrates how low‑level Java tricks—memory‑mapped I/O, custom data structures, branch‑prediction awareness, and SIMD—can turn a multi‑minute benchmark into a sub‑two‑second marvel.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
