Backend Development 20 min read

Cracking the One Billion Row Challenge: Java Performance Secrets Unveiled

This article walks through the One Billion Row Challenge, explaining the problem, baseline solution, and a series of progressive Java optimizations—from parallel streams and better JVMs to custom hash tables, Unsafe, SWAR, and statistical tuning—that shrink execution time from minutes to under two seconds.

Su San Talks Tech

Mar 5, 2024

Cracking the One Billion Row Challenge: Java Performance Secrets Unveiled

Challenge Overview

On January 1, 2024 Gunnar Morling announced the One Billion Row Challenge (1BRC). The task is to read a massive file where each line contains a weather‑station name and a temperature (semicolon‑separated, one decimal place), then compute the minimum, maximum and average temperature for each station and output the results in dictionary order.

https://www.morling.dev/blog/one-billion-row-challenge/

Example input:

chengdu;12.0 guangzhou;7.2; chengdu;6.3 beijing;-3.6; chengdu;23.0 shanghai;9.8; chengdu;24.3 beijing;17.8;

For the station chengdu the minimum is 6.3, the maximum is 24.3, and the average is (12.0+6.3+23.0+24.3)/4 = 16.4.

Baseline Solution

The official baseline code aggregates measurements with a MeasurementAggregator object that stores min, max, sum and count. It uses Java streams to read the file line‑by‑line, split each line on the semicolon, parse the temperature, and group by station using a TreeMap. On a high‑end server the baseline finishes in under two minutes; on a modest machine it takes about 14 minutes.

First Place Code

The winning solution (Thomas Wü) is highly optimized and difficult to read. It relies on custom low‑level parsing, aggressive inlining, and extensive use of Unsafe and SIMD techniques.

Optimization Journey

Version 0 – Better JVM

Switching from the default JVM to GraalVM reduces runtime from 71 s to 66 s, mainly by eliminating JVM start‑up overhead.

Version 1 – Parallel I/O

Reading the file in parallel chunks (using memory‑mapped I/O) replaces the line‑by‑line BufferedReader. This cuts the runtime to 17 s.

Version 2 – Faster Temperature Parsing

Instead of converting the temperature string to a double and then to an integer, a custom parser reads the digits directly into an int, dropping the runtime to 11 s.

private int parseTemperature(long semicolonPos) {
    long off = semicolonPos + 1;
    int sign = 1;
    byte b = chunk.get(JAVA_BYTE, off++);
    if (b == '-') {
        sign = -1;
        b = chunk.get(JAVA_BYTE, off++);
    }
    int temp = b - '0';
    b = chunk.get(JAVA_BYTE, off++);
    if (b != '.') {
        temp = 10 * temp + b - '0';
        off++; // skip '.'
    }
    b = chunk.get(JAVA_BYTE, off);
    temp = 10 * temp + b - '0';
    return sign * temp;
}

Version 3 – Custom Hash Table

Because only about 400 stations exist, a handcrafted open‑addressing hash table replaces HashMap, reducing runtime to 6.6 s.

Version 4 – Unsafe & SWAR

Using sun.misc.Unsafe for unchecked memory access and SIMD‑within‑a‑register (SWAR) techniques to locate semicolons and parse temperatures pushes the runtime down to 2.4 s.

Version 5 – Statistical Tuning

Analyzing the distribution of station‑name lengths shows roughly half are ≤ 8 bytes. Adjusting branch predictions and further micro‑optimizations brings the final runtime to 1.8 s.

Further Tweaks

Eliminating start‑up/tear‑down costs and using smaller file chunks with work‑stealing shave another 0.1 s, yielding a best‑in‑class time of 1.7 s.

Takeaways

The challenge demonstrates how low‑level I/O, custom data structures, branch‑prediction awareness, and JVM tuning can turn a minute‑scale Java program into a sub‑two‑second solution. While the code becomes harder to read, the techniques are valuable for high‑performance Java back‑end development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance optimization GraalVM Parallel I/O One Billion Row Challenge Custom Hash Table MemorySegment

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Challenge Overview

Baseline Solution

First Place Code

Optimization Journey

Version 0 – Better JVM

Version 1 – Parallel I/O

Version 2 – Faster Temperature Parsing

Version 3 – Custom Hash Table

Version 4 – Unsafe & SWAR

Version 5 – Statistical Tuning

Further Tweaks

Takeaways

Su San Talks Tech

How this landed with the community

Was this worth your time?

0 Comments

Version 0 – Better JVM

Version 1 – Parallel I/O

Version 2 – Faster Temperature Parsing

Version 3 – Custom Hash Table

Version 4 – Unsafe & SWAR

Version 5 – Statistical Tuning