Can You Beat the One Billion Row Challenge? Inside Java Performance Secrets

This article explores the One Billion Row Challenge, a Java benchmark that requires parsing a 13 GB file of one billion temperature records, and walks through baseline code, top‑ranked solutions, and a step‑by‑step performance tuning journey that reduces execution time from minutes to under two seconds.

dbaplus Community
dbaplus Community
dbaplus Community
Can You Beat the One Billion Row Challenge? Inside Java Performance Secrets

What is the One Billion Row Challenge?

The challenge, published by Gunnar Morling on January 1 2024, asks participants to read a 13 GB text file containing one billion lines, each line holding a weather‑station name and a temperature value (one decimal place). For every station the minimum, maximum and average temperature must be computed and the results output in dictionary order.

Baseline Java solution

The reference implementation creates a MeasurementAggregator that stores the minimum, maximum, sum and count for each station. It reads the file line‑by‑line, splits each line on the semicolon, parses the temperature as a double, updates the aggregator and finally writes the results using a TreeMap to keep dictionary order.

The baseline runs in under two minutes on a high‑end benchmark server (32‑core AMD EPYC 7502P, 128 GB RAM), but on a typical laptop it takes about 14 minutes.

Top‑ranked solutions and their evolution

Version 0 – Switching JVM

Simply running the same code on GraalVM instead of OpenJDK reduces the runtime from 71 s to 66 s, a modest 5‑second gain.

Version 1 – Parallel I/O

The first real speed‑up uses Java parallel streams to read and process the file concurrently, fully utilizing all CPU cores. On a Hetzner AX161 server (32 cores) the execution time drops to 71 seconds.

Version 2 – Faster temperature parsing

Parsing the temperature as an int directly from the byte buffer avoids the overhead of Double.parseDouble. The custom method extracts the sign, integer and fractional digits in a few integer operations.

private int parseTemperature(long semicolonPos) {
    long off = semicolonPos + 1;
    int sign = 1;
    byte b = chunk.get(JAVA_BYTE, off++);
    if (b == '-') {
        sign = -1;
        b = chunk.get(JAVA_BYTE, off++);
    }
    int temp = b - '0';
    b = chunk.get(JAVA_BYTE, off++);
    if (b != '.') {
        temp = 10 * temp + b - '0';
        // skip the decimal point
        off++;
    }
    b = chunk.get(JAVA_BYTE, off);
    temp = 10 * temp + b - '0';
    return sign * temp;
}

This change cuts the runtime by another 6 seconds, bringing it down to 11 seconds.

Version 3 – Custom hash table

Because the number of distinct stations (≈ 413) is known, the author replaces the generic HashMap with an open‑addressing hash table tailored to the fixed key size. The new findAcc routine directly computes a hash, probes for collisions, and stores StationStats objects without the overhead of Java’s map abstractions.

After this optimisation the execution time falls to 6.6 seconds.

Version 4 – Unsafe and SWAR

The fourth iteration drops the safe Java APIs in favour of sun.misc.Unsafe to read memory without bounds checks and applies SWAR (SIMD‑within‑a‑register) techniques to locate semicolons and parse temperatures eight bytes at a time. The code also reuses loaded bytes for hashing, eliminating redundant reads.

These low‑level tricks push the runtime down to 2.4 seconds.

Version 5 – Statistics‑driven tweaks

Profiling revealed that half of the station names are ≤ 8 bytes, causing frequent branch mispredictions in the nameEquals method. A small helper program analyses name‑length distribution and shows that moving the length check to > 16 bytes reduces misprediction from 50 % to 2.5 %. The author then rewrites the comparison routine to avoid the if when possible, shaving another 0.1 seconds.

Final results

Combining all the above optimisations, the author’s implementation processes the 13 GB, one‑billion‑line dataset in just 1.7 seconds on the same benchmark server, a 45 % improvement over the previous best OpenJDK result and a dramatic speed‑up compared with the initial 14‑minute run on a regular laptop.

All source code referenced in the article is publicly available on GitHub:

Baseline: https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_baseline.java

Top‑ranked solutions: https://github.com/gunnarmorling/1brc, https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog1.java, https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog2.java, https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog3.java, https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog4.java, https://github.com/mtopolnik/billion-row-challenge/blob/main/src/Blog5.java

The article also includes profiling links, flame‑graph images and a brief discussion on the trade‑off between readability and raw performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance OptimizationbenchmarkprofilingOne Billion Row Challenge
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.