Why Go’s Memory Usage Explodes in Million‑Thread Benchmarks – A Deep Dive
The article analyses a large‑scale benchmark comparing Go, C, Rust, C# and other languages under single, 100 k and 1 M concurrent tasks, revealing how Go’s 64‑bit int array size and goroutine stack overhead cause dramatically higher memory consumption despite comparable CPU performance.
Cache‑line impact on the 10 billion‑iteration benchmark
The benchmark iterates over an array ten billion times. In C the array element type is int32_t (4 bytes); Rust uses u32 (4 bytes); Go uses int, which on a 64‑bit OS is int64 (8 bytes). A 64‑byte cache line therefore holds 16 elements for C/Rust but only 8 for Go, halving the amount of data loaded per cache‑line fetch.
A community pull‑request changed the Go array element to int32. After rebuilding, Go’s runtime matched the execution time of C, Rust and Zig, confirming that element size was the dominant factor.
Massive‑concurrency memory‑usage benchmark
Three concurrency scenarios were measured:
Single task.
100 000 concurrent tasks.
1 000 000 concurrent tasks.
Each task simply sleeps for 10 seconds. The languages use their native concurrency primitives: go for Go, async/await for Rust and C#, and the same pattern for Zig (not shown). CPU time and memory usage were recorded.
Observed results
Single task – Rust, C# and Go show comparable CPU time and low memory.
100 k tasks – Rust and C# keep memory modest; Go’s memory usage begins to increase noticeably.
1 M tasks – Go’s RSS grows to several gigabytes, while Rust and C# stay under 500 MB.
Instrumentation to expose Go’s memory spike
A helper printUsage function was added. It sleeps 5 seconds, then reads runtime.MemStats and prints Alloc, StackSys, Sys. It also runs ps -o rss -p $PID to capture the OS‑level RSS.
func printUsage() {
time.Sleep(5 * time.Second)
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc = %v MiB
", bToMb(m.Alloc))
fmt.Printf("Stack = %v MiB
", bToMb(m.StackSys))
fmt.Printf("Sys = %v MiB
", bToMb(m.Sys))
output, err := exec.Command("ps", "-o", "rss", "-p", fmt.Sprintf("%d", os.Getpid())).Output()
if err == nil {
lines := strings.Split(string(output), "
")
if len(lines) >= 2 {
rss, err := strconv.ParseInt(strings.TrimSpace(lines[1]), 10, 64)
if err == nil {
fmt.Printf("RSS: %v MiB
", bToMb(uint64(rss)*1024))
}
}
}
}
func bToMb(b uint64) uint64 { return b / 1024 / 1024 } package main
import (
"fmt"
"os"
"os/exec"
"strconv"
"strings"
"sync"
"time"
)
func main() {
fmt.Printf("pid: %d
", os.Getpid())
numRoutines := 100000
if len(os.Args) > 1 {
if n, err := strconv.Atoi(os.Args[1]); err == nil {
numRoutines = n
}
}
start := time.Now()
var wg sync.WaitGroup
for i := 0; i < numRoutines; i++ {
wg.Add(1)
go func() {
time.Sleep(10 * time.Second)
wg.Done()
}()
}
go printUsage()
wg.Wait()
fmt.Printf("Time taken = %v
", time.Since(start))
}Analysis of the memory profile
On an older Linux host the RSS reached several gigabytes. The stack contribution alone was about 2 KB per goroutine – the default stack size grew from 4 KB in Go 1.2 to 2 KB in Go 1.4 and is now dynamically sized (Go 1.19). The heap added hundreds of megabytes because each goroutine allocates a small runtime object g that consumes roughly 435 MB in total.
By contrast, Rust futures occupy 64‑128 bytes each, and C# async tasks use roughly 100‑200 bytes. Consequently, at 1 M concurrent units Rust and C# stay under 500 MB while Go exceeds several gigabytes.
Attempts to mitigate the blow‑up
Changing the default stack size via GODEBUG=stacktrace=1 (or similar) – no measurable reduction.
Introducing a goroutine pool to reuse stacks – memory remained high.
Using a time‑wheel scheduler – did not affect the per‑goroutine overhead.
References
First edition of the benchmark: https://hez2010.github.io/async-runtimes-benchmarks-2024
Second edition (updated code and results): https://hez2010.github.io/async-runtimes-benchmarks-2024/take2.html
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
