Mastering the 1 Billion Row Java Challenge: Tips, Rules, and Evaluation
This article explains the 1 Billion Row Challenge (1BRC) for Java 21, detailing the data format, required output, how to build and run the benchmark, optimization options, submission rules, and the evaluation environment used to rank participants.
1. Introduction
Starting from New Year's Day 2024, the 1 Billion Row Challenge (1BRC) is open for submissions until 31 January 2024 23:59 UTC; any pull request created after that will not be considered.
The challenge aims to test Java's ability to aggregate one billion lines of temperature data from a text file, encouraging the use of all (virtual) threads, SIMD, garbage‑collector tuning, or any other technique to produce the fastest solution.
2. Challenge Details
The input file contains temperature measurements from weather stations, one per line, formatted as
<string: station name>;<double: measurement>with exactly one decimal place. Example (10 lines):
<code>Hamburg;12.0<br/>Bulawayo;8.9<br/>Palembang;38.8<br/>St. John's;15.2<br/>Cracow;12.6<br/>Bridgetown;26.9<br/>Istanbul;6.2<br/>Roseau;34.4<br/>Conakry;31.2<br/>Istanbul;23.0<br/></code>The task is to write a Java program that reads the file, computes the minimum, average, and maximum temperature for each station, sorts the stations alphabetically, and prints results in the form
<min>/<avg>/<max>with one decimal place, e.g.
<code>{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, …}</code>Java 21 must be used.
3. Running the Challenge
The repository (named
1brc) contains two programs:
dev.morling.onebrc.CreateMeasurements (invoked via create_measurements.sh ) creates a configurable
measurements.txtfile with random data.
dev.morling.onebrc.CalculateAverage (invoked via calculate_average.sh ) computes the averages from
measurements.txt.
Steps to run:
Build the project with Apache Maven:
./mvnw clean verifyCreate a 1‑billion‑row measurement file (run once):
./create_measurements.sh 1000000000(produces a ~12 GB file; ensure sufficient disk space).
Calculate the averages:
./calculate_average.shOptimize the
CalculateAverageprogram using any technique you deem appropriate—parallelism, the incubating Vector API, memory‑mapped file sections, AppCDS, GraalVM, CRaC, GC tuning, etc.
The provided simple implementation uses Java Stream API and finishes in about two minutes on the reference hardware, serving as a baseline.
4. Rules and Limitations
Any Java distribution may be used (SDKMan builds, early‑access builds from openjdk.net, builds from builds.shipilev.net, etc.).
No external dependencies are allowed.
The implementation must consist of a single Java source file.
All computation must happen at runtime; pre‑computing results during build time (e.g., embedding them in a native image) is prohibited.
Input constraints:
Station name: non‑empty UTF‑8 string, 1–100 characters.
Temperature: double between –99.9 and 99.9 inclusive, always with one decimal place.
The solution must work for any valid station name and any data distribution; it cannot rely on special properties of the provided dataset.
5. Participating in the Challenge
To submit your implementation:
Fork the
1brcGitHub repository.
Copy
CalculateAverage.javato a new file named
CalculateAverage_<your_GH_user>.java(e.g.,
CalculateAverage_doloreswilson.java).
Make your implementation as fast as possible.
Copy
calculate_average.shto
calculate_average_<your_GH_user>.shand adjust it to invoke your class, adding any JVM options via
JAVA_OPTSif needed.
OpenJDK 21 is the default; if you use a custom JDK, include the appropriate
sdk use java [version]command in the startup script.
(Optional) To build a native binary with GraalVM, modify
pom.xmlaccordingly.
Create a pull request against the upstream repository, clearly stating the class name, your runtime on your hardware (CPU, cores, RAM), and the measured execution time.
Community discussion is encouraged via the repository’s GitHub Discussions.
6. Evaluation
Results are measured on a Hetzner Cloud CCX33 instance (8 CPU, 32 GB RAM). Execution time is recorded for five consecutive runs; the fastest and slowest runs are discarded, and the average of the remaining three determines the competitor’s score. All competitors use the identical
measurements.txtfile. Scripts based on Terraform and Ansible are provided for anyone who wishes to reproduce the environment (note that running them incurs cloud costs).
Project repository: https://github.com/gunnarmorling/1brc . Join the challenge!
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.