How to Process One Billion Rows in Go: 9 Optimized Solutions Under 4 Seconds

This article walks through nine Go‑based implementations for the 1‑Billion‑Row Challenge, starting from a straightforward scanner approach and progressively applying map pointer values, custom parsing, integer arithmetic, buffer tweaks, custom hash tables, and parallelism to shrink processing time from 1 minute 45 seconds to under 4 seconds.

Go Development Architecture Practice
Go Development Architecture Practice
Go Development Architecture Practice
How to Process One Billion Rows in Go: 9 Optimized Solutions Under 4 Seconds

Challenge Overview

The 1‑Billion‑Row Challenge (1BRC) requires reading a 13 GB text file containing one billion temperature measurements and computing, for each weather station, the minimum, average, and maximum temperature as quickly as possible. Ben Hoyt solved the problem using only the Go standard library, presenting nine progressively faster implementations.

Baseline I/O

Reading the raw file with cat takes about 1.05 s. Counting lines with wc takes roughly 55.7 s, establishing a lower bound for I/O.

Solution 1 – Simple Go code

Uses bufio.Scanner to read lines, strings.Cut to split on the semicolon, strconv.ParseFloat for temperature parsing, and a plain map[string]stats to aggregate results.

func r1(inputPath string, output io.Writer) error {
    type stats struct {
        min, max, sum float64
        count         int64
    }
    f, err := os.Open(inputPath)
    if err != nil { return err }
    defer f.Close()
    stationStats := make(map[string]stats)
    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        line := scanner.Text()
        station, tempStr, hasSemi := strings.Cut(line, ";")
        if !hasSemi { continue }
        temp, err := strconv.ParseFloat(tempStr, 64)
        if err != nil { return err }
        s, ok := stationStats[station]
        if !ok {
            s = stats{min: temp, max: temp, sum: temp, count: 1}
        } else {
            s.min = min(s.min, temp)
            s.max = max(s.max, temp)
            s.sum += temp
            s.count++
        }
        stationStats[station] = s
    }
    stations := make([]string, 0, len(stationStats))
    for station := range stationStats { stations = append(stations, station) }
    sort.Strings(stations)
    fmt.Fprint(output, "{")
    for i, station := range stations {
        if i > 0 { fmt.Fprint(output, ", ") }
        s := stationStats[station]
        mean := s.sum / float64(s.count)
        fmt.Fprintf(output, "%s=%.1f/%.1f/%.1f", station, s.min, mean, s.max)
    }
    fmt.Fprint(output, "}
")
    return nil
}

Runtime: 1 min 45 s.

Solution 2 – Map with pointer values

Changing the map to hold *stats reduces hash look‑ups and allocations.

stationStats := make(map[string]*stats)
// inside the loop:
    s := stationStats[station]
    if s == nil {
        stationStats[station] = &stats{min: temp, max: temp, sum: temp, count: 1}
    } else {
        s.min = min(s.min, temp)
        s.max = max(s.max, temp)
        s.sum += temp
        s.count++
    }

Runtime: 1 min 31 s.

Solution 3 – Hand‑written float parsing

Replacing strconv.ParseFloat with a tiny custom parser that works directly on the byte slice eliminates string allocations.

negative := false
index := 0
if tempBytes[index] == '-' { index++; negative = true }
temp := float64(tempBytes[index]-'0')
index++
if tempBytes[index] != '.' {
    temp = temp*10 + float64(tempBytes[index]-'0')
    index++
}
index++ // skip '.'
temp += float64(tempBytes[index]-'0') / 10
if negative { temp = -temp }

Runtime: 55.8 s.

Solution 4 – Fixed‑point integers

Store temperatures as tenths of a degree using 32‑bit integers, keeping int32 for min/max/count and int64 for sum.

type stats struct {
    min, max, count int32
    sum             int64
}
// output conversion:
mean := float64(s.sum) / float64(s.count) / 10
fmt.Fprintf(output, "%s=%.1f/%.1f/%.1f", station, float64(s.min)/10, mean, float64(s.max)/10)

Runtime: 51.0 s.

Solution 5 – Remove bytes.Cut

Parse the line from the end to locate the semicolon, avoiding the generic bytes.Cut call.

end := len(line)
// extract temperature digits from the tail of the line (code omitted for brevity)
station := line[:semicolon]

Runtime: 46.0 s.

Solution 6 – Drop bufio.Scanner

Read the file in 1 MiB chunks and manually scan for newlines, eliminating the overhead of Scanner.

buf := make([]byte, 1024*1024)
readStart := 0
for {
    n, err := f.Read(buf[readStart:])
    if err != nil && err != io.EOF { return err }
    if readStart+n == 0 { break }
    chunk := buf[:readStart+n]
    newline := bytes.LastIndexByte(chunk, '
')
    if newline < 0 { break }
    // process chunk up to newline, keep remainder for next iteration
    // ...
    readStart = copy(buf, chunk[newline+1:])
}

Runtime: 41.3 s.

Solution 7 – Custom hash table

Implement a linear‑probing FNV‑1a hash table with pre‑allocated 100 000 buckets, storing keys as byte slices to avoid string allocations.

type item struct { key []byte; stat *stats }
items := make([]item, 100000)
size := 0
// hashing loop (simplified):
const (
    offset64 = 14695981039346656037
    prime64  = 1099511628211
)
hash := uint64(offset64)
for i := 0; i < len(chunk); i++ {
    c := chunk[i]
    if c == ';' { station = chunk[:i]; after = chunk[i+1:]; break }
    hash ^= uint64(c)
    hash *= prime64
}
hashIndex := int(hash & uint64(len(items)-1))
for {
    if items[hashIndex].key == nil {
        key := make([]byte, len(station))
        copy(key, station)
        items[hashIndex] = item{key: key, stat: &stats{min: temp, max: temp, sum: int64(temp), count: 1}}
        size++
        if size > len(items)/2 { panic("too many items in hash table") }
        break
    }
    if bytes.Equal(items[hashIndex].key, station) {
        s := items[hashIndex].stat
        s.min = min(s.min, temp)
        s.max = max(s.max, temp)
        s.sum += int64(temp)
        s.count++
        break
    }
    hashIndex++
    if hashIndex >= len(items) { hashIndex = 0 }
}

Runtime: 25.8 s.

Solution 8 – Parallel chunk processing

Split the file into equal‑size parts, launch a goroutine per part, and merge the per‑part maps.

parts, err := splitFile(inputPath, maxGoroutines)
if err != nil { return err }
resultsCh := make(chan map[string]r8Stats)
for _, part := range parts {
    go r8ProcessPart(inputPath, part.offset, part.size, resultsCh)
}
// aggregate results
totals := make(map[string]r8Stats)
for i := 0; i < len(parts); i++ {
    partResult := <-resultsCh
    for station, s := range partResult {
        if ts, ok := totals[station]; !ok {
            totals[station] = s
        } else {
            ts.min = min(ts.min, s.min)
            ts.max = max(ts.max, s.max)
            ts.sum += s.sum
            ts.count += s.count
            totals[station] = ts
        }
    }
}

Runtime: 24.3 s.

Solution 9 – Combined optimizations + parallelism

Merge all previous optimizations (pointer map, custom parsing, integer arithmetic, custom hash table) with the parallel framework.

Runtime: 3.99 s on a SSD‑equipped 32 GB RAM Linux laptop. Profiling shows 82 % of the time spent in the core processing function, with bytes.Equal accounting for 13 % and file I/O the remaining 5 %.

Takeaways

Even a seemingly simple aggregation over a billion rows benefits dramatically from careful I/O handling, allocation avoidance, integer arithmetic, hash‑table design, and parallelism.

Performance gains of more than 25× are achievable using only the Go standard library.

These techniques are relevant for high‑throughput data pipelines, runtime developers, and any workload where raw processing speed matters.

Performance diagram
Performance diagram

Reference URLs:

https://benhoyt.com/writings/go-1brc/

https://www.morling.dev/blog/one-billion-row-challenge/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data processingGobenchmarkParallelism1BRC
Go Development Architecture Practice
Written by

Go Development Architecture Practice

Daily sharing of Golang-related technical articles, practical resources, language news, tutorials, real-world projects, and more. Looking forward to growing together. Let's go!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.