Mastering the 1 Billion Row Java Challenge: Tips, Rules, and Evaluation

This article explains the 1 Billion Row Challenge (1BRC) for Java 21, detailing the data format, required output, how to build and run the benchmark, optimization options, submission rules, and the evaluation environment used to rank participants.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Mastering the 1 Billion Row Java Challenge: Tips, Rules, and Evaluation

1. Introduction

Starting from New Year's Day 2024, the 1 Billion Row Challenge (1BRC) is open for submissions until 31 January 2024 23:59 UTC; any pull request created after that will not be considered.

The challenge aims to test Java's ability to aggregate one billion lines of temperature data from a text file, encouraging the use of all (virtual) threads, SIMD, garbage‑collector tuning, or any other technique to produce the fastest solution.

2. Challenge Details

The input file contains temperature measurements from weather stations, one per line, formatted as <string: station name>;<double: measurement> with exactly one decimal place. Example (10 lines):

Hamburg;12.0<br/>Bulawayo;8.9<br/>Palembang;38.8<br/>St. John's;15.2<br/>Cracow;12.6<br/>Bridgetown;26.9<br/>Istanbul;6.2<br/>Roseau;34.4<br/>Conakry;31.2<br/>Istanbul;23.0<br/>

The task is to write a Java program that reads the file, computes the minimum, average, and maximum temperature for each station, sorts the stations alphabetically, and prints results in the form <min>/<avg>/<max> with one decimal place, e.g.

{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, …}

Java 21 must be used.

3. Running the Challenge

The repository (named 1brc) contains two programs:

dev.morling.onebrc.CreateMeasurements (invoked via create_measurements.sh ) creates a configurable measurements.txt file with random data.

dev.morling.onebrc.CalculateAverage (invoked via calculate_average.sh ) computes the averages from measurements.txt.

Steps to run:

Build the project with Apache Maven: ./mvnw clean verify Create a 1‑billion‑row measurement file (run once): ./create_measurements.sh 1000000000 (produces a ~12 GB file; ensure sufficient disk space).

Calculate the averages: ./calculate_average.sh Optimize the CalculateAverage program using any technique you deem appropriate—parallelism, the incubating Vector API, memory‑mapped file sections, AppCDS, GraalVM, CRaC, GC tuning, etc.

The provided simple implementation uses Java Stream API and finishes in about two minutes on the reference hardware, serving as a baseline.

4. Rules and Limitations

Any Java distribution may be used (SDKMan builds, early‑access builds from openjdk.net, builds from builds.shipilev.net, etc.).

No external dependencies are allowed.

The implementation must consist of a single Java source file.

All computation must happen at runtime; pre‑computing results during build time (e.g., embedding them in a native image) is prohibited.

Input constraints:

Station name: non‑empty UTF‑8 string, 1–100 characters.

Temperature: double between –99.9 and 99.9 inclusive, always with one decimal place.

The solution must work for any valid station name and any data distribution; it cannot rely on special properties of the provided dataset.

5. Participating in the Challenge

To submit your implementation:

Fork the 1brc GitHub repository.

Copy CalculateAverage.java to a new file named CalculateAverage_<your_GH_user>.java (e.g., CalculateAverage_doloreswilson.java).

Make your implementation as fast as possible.

Copy calculate_average.sh to calculate_average_<your_GH_user>.sh and adjust it to invoke your class, adding any JVM options via JAVA_OPTS if needed.

OpenJDK 21 is the default; if you use a custom JDK, include the appropriate sdk use java [version] command in the startup script.

(Optional) To build a native binary with GraalVM, modify pom.xml accordingly.

Create a pull request against the upstream repository, clearly stating the class name, your runtime on your hardware (CPU, cores, RAM), and the measured execution time.

Community discussion is encouraged via the repository’s GitHub Discussions.

6. Evaluation

Results are measured on a Hetzner Cloud CCX33 instance (8 CPU, 32 GB RAM). Execution time is recorded for five consecutive runs; the fastest and slowest runs are discarded, and the average of the remaining three determines the competitor’s score. All competitors use the identical measurements.txt file. Scripts based on Terraform and Ansible are provided for anyone who wishes to reproduce the environment (note that running them incurs cloud costs).

Project repository: https://github.com/gunnarmorling/1brc . Join the challenge!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancedata-processingBenchmarkChallenge
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.