How to Process One Billion Rows in Go: 9 Optimized Solutions Under 4 Seconds
This article walks through nine Go‑based implementations for the 1‑Billion‑Row Challenge, starting from a straightforward scanner approach and progressively applying map pointer values, custom parsing, integer arithmetic, buffer tweaks, custom hash tables, and parallelism to shrink processing time from 1 minute 45 seconds to under 4 seconds.
Challenge Overview
The 1‑Billion‑Row Challenge (1BRC) requires reading a 13 GB text file containing one billion temperature measurements and computing, for each weather station, the minimum, average, and maximum temperature as quickly as possible. Ben Hoyt solved the problem using only the Go standard library, presenting nine progressively faster implementations.
Baseline I/O
Reading the raw file with cat takes about 1.05 s. Counting lines with wc takes roughly 55.7 s, establishing a lower bound for I/O.
Solution 1 – Simple Go code
Uses bufio.Scanner to read lines, strings.Cut to split on the semicolon, strconv.ParseFloat for temperature parsing, and a plain map[string]stats to aggregate results.
func r1(inputPath string, output io.Writer) error {
type stats struct {
min, max, sum float64
count int64
}
f, err := os.Open(inputPath)
if err != nil { return err }
defer f.Close()
stationStats := make(map[string]stats)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := scanner.Text()
station, tempStr, hasSemi := strings.Cut(line, ";")
if !hasSemi { continue }
temp, err := strconv.ParseFloat(tempStr, 64)
if err != nil { return err }
s, ok := stationStats[station]
if !ok {
s = stats{min: temp, max: temp, sum: temp, count: 1}
} else {
s.min = min(s.min, temp)
s.max = max(s.max, temp)
s.sum += temp
s.count++
}
stationStats[station] = s
}
stations := make([]string, 0, len(stationStats))
for station := range stationStats { stations = append(stations, station) }
sort.Strings(stations)
fmt.Fprint(output, "{")
for i, station := range stations {
if i > 0 { fmt.Fprint(output, ", ") }
s := stationStats[station]
mean := s.sum / float64(s.count)
fmt.Fprintf(output, "%s=%.1f/%.1f/%.1f", station, s.min, mean, s.max)
}
fmt.Fprint(output, "}
")
return nil
}Runtime: 1 min 45 s.
Solution 2 – Map with pointer values
Changing the map to hold *stats reduces hash look‑ups and allocations.
stationStats := make(map[string]*stats)
// inside the loop:
s := stationStats[station]
if s == nil {
stationStats[station] = &stats{min: temp, max: temp, sum: temp, count: 1}
} else {
s.min = min(s.min, temp)
s.max = max(s.max, temp)
s.sum += temp
s.count++
}Runtime: 1 min 31 s.
Solution 3 – Hand‑written float parsing
Replacing strconv.ParseFloat with a tiny custom parser that works directly on the byte slice eliminates string allocations.
negative := false
index := 0
if tempBytes[index] == '-' { index++; negative = true }
temp := float64(tempBytes[index]-'0')
index++
if tempBytes[index] != '.' {
temp = temp*10 + float64(tempBytes[index]-'0')
index++
}
index++ // skip '.'
temp += float64(tempBytes[index]-'0') / 10
if negative { temp = -temp }Runtime: 55.8 s.
Solution 4 – Fixed‑point integers
Store temperatures as tenths of a degree using 32‑bit integers, keeping int32 for min/max/count and int64 for sum.
type stats struct {
min, max, count int32
sum int64
}
// output conversion:
mean := float64(s.sum) / float64(s.count) / 10
fmt.Fprintf(output, "%s=%.1f/%.1f/%.1f", station, float64(s.min)/10, mean, float64(s.max)/10)Runtime: 51.0 s.
Solution 5 – Remove bytes.Cut
Parse the line from the end to locate the semicolon, avoiding the generic bytes.Cut call.
end := len(line)
// extract temperature digits from the tail of the line (code omitted for brevity)
station := line[:semicolon]Runtime: 46.0 s.
Solution 6 – Drop bufio.Scanner
Read the file in 1 MiB chunks and manually scan for newlines, eliminating the overhead of Scanner.
buf := make([]byte, 1024*1024)
readStart := 0
for {
n, err := f.Read(buf[readStart:])
if err != nil && err != io.EOF { return err }
if readStart+n == 0 { break }
chunk := buf[:readStart+n]
newline := bytes.LastIndexByte(chunk, '
')
if newline < 0 { break }
// process chunk up to newline, keep remainder for next iteration
// ...
readStart = copy(buf, chunk[newline+1:])
}Runtime: 41.3 s.
Solution 7 – Custom hash table
Implement a linear‑probing FNV‑1a hash table with pre‑allocated 100 000 buckets, storing keys as byte slices to avoid string allocations.
type item struct { key []byte; stat *stats }
items := make([]item, 100000)
size := 0
// hashing loop (simplified):
const (
offset64 = 14695981039346656037
prime64 = 1099511628211
)
hash := uint64(offset64)
for i := 0; i < len(chunk); i++ {
c := chunk[i]
if c == ';' { station = chunk[:i]; after = chunk[i+1:]; break }
hash ^= uint64(c)
hash *= prime64
}
hashIndex := int(hash & uint64(len(items)-1))
for {
if items[hashIndex].key == nil {
key := make([]byte, len(station))
copy(key, station)
items[hashIndex] = item{key: key, stat: &stats{min: temp, max: temp, sum: int64(temp), count: 1}}
size++
if size > len(items)/2 { panic("too many items in hash table") }
break
}
if bytes.Equal(items[hashIndex].key, station) {
s := items[hashIndex].stat
s.min = min(s.min, temp)
s.max = max(s.max, temp)
s.sum += int64(temp)
s.count++
break
}
hashIndex++
if hashIndex >= len(items) { hashIndex = 0 }
}Runtime: 25.8 s.
Solution 8 – Parallel chunk processing
Split the file into equal‑size parts, launch a goroutine per part, and merge the per‑part maps.
parts, err := splitFile(inputPath, maxGoroutines)
if err != nil { return err }
resultsCh := make(chan map[string]r8Stats)
for _, part := range parts {
go r8ProcessPart(inputPath, part.offset, part.size, resultsCh)
}
// aggregate results
totals := make(map[string]r8Stats)
for i := 0; i < len(parts); i++ {
partResult := <-resultsCh
for station, s := range partResult {
if ts, ok := totals[station]; !ok {
totals[station] = s
} else {
ts.min = min(ts.min, s.min)
ts.max = max(ts.max, s.max)
ts.sum += s.sum
ts.count += s.count
totals[station] = ts
}
}
}Runtime: 24.3 s.
Solution 9 – Combined optimizations + parallelism
Merge all previous optimizations (pointer map, custom parsing, integer arithmetic, custom hash table) with the parallel framework.
Runtime: 3.99 s on a SSD‑equipped 32 GB RAM Linux laptop. Profiling shows 82 % of the time spent in the core processing function, with bytes.Equal accounting for 13 % and file I/O the remaining 5 %.
Takeaways
Even a seemingly simple aggregation over a billion rows benefits dramatically from careful I/O handling, allocation avoidance, integer arithmetic, hash‑table design, and parallelism.
Performance gains of more than 25× are achievable using only the Go standard library.
These techniques are relevant for high‑throughput data pipelines, runtime developers, and any workload where raw processing speed matters.
Reference URLs:
https://benhoyt.com/writings/go-1brc/
https://www.morling.dev/blog/one-billion-row-challenge/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Go Development Architecture Practice
Daily sharing of Golang-related technical articles, practical resources, language news, tutorials, real-world projects, and more. Looking forward to growing together. Let's go!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
