How to Slash Go GC Overhead on Large Heaps: Techniques and Code

This article examines why Go's garbage collector can become a CPU bottleneck with large heaps, demonstrates the performance impact with benchmark programs, and presents practical strategies—such as using pointer‑free allocations, mmap‑backed memory, and string interning—to dramatically reduce GC pause times.

Radish, Keep Going!
Radish, Keep Going!
Radish, Keep Going!
How to Slash Go GC Overhead on Large Heaps: Techniques and Code

What’s the problem?

The Go garbage collector (GC) works well for small allocations, but when the heap grows large the GC must scan many more memory blocks, which can consume significant CPU time and, in extreme cases, may not finish.

Is this a big problem?

To illustrate, we allocate 1 × 10⁹ 8‑byte pointers (≈8 GB), force a GC, and measure its duration. The benchmark runs ten iterations to obtain stable values and calls runtime.KeepAlive to prevent compiler optimizations.

func main() {
    a := make([]*int, 1e9)

    for i := 0; i < 10; i++ {
        start := time.Now()
        fmt.Printf("GC took %s
", time.Since(start))
    }

    runtime.KeepAlive(a)
}

On a 2015 MacBook Pro the GC pauses range from 0.55 s to 4.27 s, i.e., each GC takes more than half a second.

What next?

If an application truly needs a gigantic map or array, the regular GC scan of the entire heap can waste a lot of CPU. We have essentially two options: hide the memory from the GC, or make the GC uninterested in it so it does not scan it.

Make GC skip this memory

If the allocated type contains no pointers, the GC does not need to scan it. We repeat the benchmark with a slice of plain int values (no pointers).

func main() {
    a := make([]int, 1e9)

    for i := 0; i < 10; i++ {
        start := time.Now()
        runtime.GC()
        fmt.Printf("GC took %s
", time.Since(start))
    }

    runtime.KeepAlive(a)
}

On the same machine the GC pauses drop to 100‑200 µs, about a thousand‑fold faster, while the allocated memory size is unchanged.

Hide the memory from GC

Another approach is to allocate memory directly from the operating system with mmap. Memory obtained this way is invisible to the Go GC, so it is never scanned.

package main

import (
    "fmt"
    "reflect"
    "runtime"
    "syscall"
    "time"
    "unsafe"
)

func main() {
    var example *int
    slice := makeSlice(1e9, unsafe.Sizeof(example))
    a := *(*[]*int)(unsafe.Pointer(&slice))

    for i := 0; i < 10; i++ {
        start := time.Now()
        runtime.GC()
        fmt.Printf("GC took %s
", time.Since(start))
    }

    runtime.KeepAlive(a)
}

func makeSlice(len int, eltsize uintptr) reflect.SliceHeader {
    fd := -1
    data, _, errno := syscall.Syscall6(
        syscall.SYS_MMAP,
        0,
        uintptr(len)*eltsize,
        syscall.PROT_READ|syscall.PROT_WRITE,
        syscall.MAP_ANON|syscall.MAP_PRIVATE,
        uintptr(fd),
        0,
    )
    if errno != 0 {
        panic(errno)
    }

    return reflect.SliceHeader{
        Data: data,
        Len:  len,
        Cap:  len,
    }
}

Running this program yields GC pauses comparable to the pointer‑free case (≈150‑460 µs). However, pointers stored in this hidden memory are not tracked by the GC, which can lead to premature reclamation of heap objects.

The essence of the problem

Pointers are the root cause of GC overhead. If we can avoid pointers in large allocations, the GC incurs little cost without resorting to tricks. When using non‑heap memory, we must also avoid storing pointers to heap objects, otherwise the GC cannot see them.

How to avoid pointers

Common pointer‑heavy structures include:

Large numbers of strings.

time.Time values.

Slices stored inside maps.

Maps with string keys.

One concrete technique is to replace a slice of strings with a single byte slice that holds all string data and a parallel slice of offsets. This eliminates the per‑string StringHeader pointers that the GC would otherwise scan.

type StringHeader struct {
    Data uintptr
    Len  int
}

Similarly, a SliceHeader describes the underlying array:

type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

By storing 10⁸ strings as a contiguous byte buffer and an offset array, the GC only scans the offset slice (which contains integers) and ignores the actual string bytes, dramatically reducing pause times.

package main

import (
    "fmt"
    "runtime"
    "strconv"
    "time"
    "unsafe"
)

func main() {
    var stringBytes []byte
    var stringOffsets []int

    for i := 0; i < 1e8; i++ {
        val := strconv.Itoa(i)
        stringBytes = append(stringBytes, val...)
        stringOffsets = append(stringOffsets, len(stringBytes))
    }

    runtime.GC()
    start := time.Now()
    runtime.GC()
    fmt.Printf("GC took %s
", time.Since(start))

    sStart := 0
    for i := 0; i < 10; i++ {
        sEnd := stringOffsets[i]
        bytes := stringBytes[sStart:sEnd]
        stringVal := *(*string)(unsafe.Pointer(&bytes))
        fmt.Println(stringVal)
        sStart = sEnd
    }
}

The GC pause for this approach is only ~187 µs, and the first ten strings are printed correctly.

Further resources

String store techniques.

Go string interning library.

Converting string IDs to integer IDs for fast comparison.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performancememory managementGoGarbage CollectionunsafeLarge heap
Radish, Keep Going!
Written by

Radish, Keep Going!

Personal sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.