Backend Development 19 min read

Cracking the One Billion Row Java Challenge: Step‑by‑Step Optimizations

This article walks through the One Billion Row Java challenge, explaining the problem, baseline solution, and a series of performance optimizations—from parallel streams and custom parsers to unsafe memory access and SIMD techniques—that shrink execution time from minutes to under two seconds.

Su San Talks Tech

May 22, 2024

Cracking the One Billion Row Java Challenge: Step‑by‑Step Optimizations

Introduction

The author discovered the One Billion Row Challenge (1BRC) during the Chinese New Year and decided to explore the Java solutions submitted by top contestants.

Challenge Overview

The task is to read a 13 GB file containing 1 billion lines of city;temperature records, compute the minimum, maximum, and average temperature for each city, and output the results in dictionary order.

Sample input and expected output are shown in the original article.

Baseline Implementation

The official baseline code uses a MeasurementAggregator object and Java streams to parse each line, split on the semicolon, and aggregate statistics. It runs in about 2 minutes on a high‑end server but takes around 14 minutes on a modest machine.

Top‑Ranked Solutions

The first‑place solution is highly optimized and difficult to read; the author provides a brief overview and a code snippet.

Optimization Steps

0️⃣ Switch JVM

Running the baseline on GraalVM instead of OpenJDK reduces runtime from 71 s to 66 s.

1️⃣ Parallel I/O

File is split into chunks equal to the number of threads, each processed in parallel using MemorySegment (JDK 21) to avoid the 2 GB limit of ByteBuffer. This brings runtime down to 17 s.

2️⃣ Faster Temperature Parsing

private int parseTemperature(long semicolonPos) {
    long off = semicolonPos + 1;
    int sign = 1;
    byte b = chunk.get(JAVA_BYTE, off++);
    if (b == '-') {
        sign = -1;
        b = chunk.get(JAVA_BYTE, off++);
    }
    int temp = b - '0';
    b = chunk.get(JAVA_BYTE, off++);
    if (b != '.') {
        temp = 10 * temp + b - '0';
        off++; // skip '.'
    }
    b = chunk.get(JAVA_BYTE, off);
    temp = 10 * temp + b - '0';
    return sign * temp;
}

This reduces runtime to 11 s.

3️⃣ Custom Hash Table

Because only 413 distinct cities exist, a hand‑rolled open‑addressing hash table replaces HashMap, cutting runtime to 6.6 s.

4️⃣ Unsafe & SWAR

Using sun.misc.Unsafe and SIMD‑within‑a‑register techniques eliminates bounds checks and branches, dropping execution time to 2.4 s.

5️⃣ Statistical Tuning

Analyzing city‑name length distribution shows many names are ≤ 8 bytes; adjusting branch predictions and further micro‑optimizations bring the final runtime to 1.7 s.

Results

The progressive optimizations shrink the processing time from over 14 minutes to under two seconds, demonstrating the impact of low‑level Java performance engineering.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

benchmark one-billion-row-challenge

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.