Cracking the One Billion Row Java Challenge: Step‑by‑Step Optimizations
This article walks through the One Billion Row Java challenge, explaining the problem, baseline solution, and a series of performance optimizations—from parallel streams and custom parsers to unsafe memory access and SIMD techniques—that shrink execution time from minutes to under two seconds.
Introduction
The author discovered the One Billion Row Challenge (1BRC) during the Chinese New Year and decided to explore the Java solutions submitted by top contestants.
Challenge Overview
The task is to read a 13 GB file containing 1 billion lines of city;temperature records, compute the minimum, maximum, and average temperature for each city, and output the results in dictionary order.
Sample input and expected output are shown in the original article.
Baseline Implementation
The official baseline code uses a MeasurementAggregator object and Java streams to parse each line, split on the semicolon, and aggregate statistics. It runs in about 2 minutes on a high‑end server but takes around 14 minutes on a modest machine.
Top‑Ranked Solutions
The first‑place solution is highly optimized and difficult to read; the author provides a brief overview and a code snippet.
Optimization Steps
0️⃣ Switch JVM
Running the baseline on GraalVM instead of OpenJDK reduces runtime from 71 s to 66 s.
1️⃣ Parallel I/O
File is split into chunks equal to the number of threads, each processed in parallel using MemorySegment (JDK 21) to avoid the 2 GB limit of ByteBuffer. This brings runtime down to 17 s.
2️⃣ Faster Temperature Parsing
private int parseTemperature(long semicolonPos) {
long off = semicolonPos + 1;
int sign = 1;
byte b = chunk.get(JAVA_BYTE, off++);
if (b == '-') {
sign = -1;
b = chunk.get(JAVA_BYTE, off++);
}
int temp = b - '0';
b = chunk.get(JAVA_BYTE, off++);
if (b != '.') {
temp = 10 * temp + b - '0';
off++; // skip '.'
}
b = chunk.get(JAVA_BYTE, off);
temp = 10 * temp + b - '0';
return sign * temp;
}This reduces runtime to 11 s.
3️⃣ Custom Hash Table
Because only 413 distinct cities exist, a hand‑rolled open‑addressing hash table replaces HashMap, cutting runtime to 6.6 s.
4️⃣ Unsafe & SWAR
Using sun.misc.Unsafe and SIMD‑within‑a‑register techniques eliminates bounds checks and branches, dropping execution time to 2.4 s.
5️⃣ Statistical Tuning
Analyzing city‑name length distribution shows many names are ≤ 8 bytes; adjusting branch predictions and further micro‑optimizations bring the final runtime to 1.7 s.
Results
The progressive optimizations shrink the processing time from over 14 minutes to under two seconds, demonstrating the impact of low‑level Java performance engineering.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
