How to Slash Go GC Overhead on Large Heaps: Techniques and Code
This article examines why Go's garbage collector can become a CPU bottleneck with large heaps, demonstrates the performance impact with benchmark programs, and presents practical strategies—such as using pointer‑free allocations, mmap‑backed memory, and string interning—to dramatically reduce GC pause times.
What’s the problem?
The Go garbage collector (GC) works well for small allocations, but when the heap grows large the GC must scan many more memory blocks, which can consume significant CPU time and, in extreme cases, may not finish.
Is this a big problem?
To illustrate, we allocate 1 × 10⁹ 8‑byte pointers (≈8 GB), force a GC, and measure its duration. The benchmark runs ten iterations to obtain stable values and calls runtime.KeepAlive to prevent compiler optimizations.
func main() {
a := make([]*int, 1e9)
for i := 0; i < 10; i++ {
start := time.Now()
fmt.Printf("GC took %s
", time.Since(start))
}
runtime.KeepAlive(a)
}On a 2015 MacBook Pro the GC pauses range from 0.55 s to 4.27 s, i.e., each GC takes more than half a second.
What next?
If an application truly needs a gigantic map or array, the regular GC scan of the entire heap can waste a lot of CPU. We have essentially two options: hide the memory from the GC, or make the GC uninterested in it so it does not scan it.
Make GC skip this memory
If the allocated type contains no pointers, the GC does not need to scan it. We repeat the benchmark with a slice of plain int values (no pointers).
func main() {
a := make([]int, 1e9)
for i := 0; i < 10; i++ {
start := time.Now()
runtime.GC()
fmt.Printf("GC took %s
", time.Since(start))
}
runtime.KeepAlive(a)
}On the same machine the GC pauses drop to 100‑200 µs, about a thousand‑fold faster, while the allocated memory size is unchanged.
Hide the memory from GC
Another approach is to allocate memory directly from the operating system with mmap. Memory obtained this way is invisible to the Go GC, so it is never scanned.
package main
import (
"fmt"
"reflect"
"runtime"
"syscall"
"time"
"unsafe"
)
func main() {
var example *int
slice := makeSlice(1e9, unsafe.Sizeof(example))
a := *(*[]*int)(unsafe.Pointer(&slice))
for i := 0; i < 10; i++ {
start := time.Now()
runtime.GC()
fmt.Printf("GC took %s
", time.Since(start))
}
runtime.KeepAlive(a)
}
func makeSlice(len int, eltsize uintptr) reflect.SliceHeader {
fd := -1
data, _, errno := syscall.Syscall6(
syscall.SYS_MMAP,
0,
uintptr(len)*eltsize,
syscall.PROT_READ|syscall.PROT_WRITE,
syscall.MAP_ANON|syscall.MAP_PRIVATE,
uintptr(fd),
0,
)
if errno != 0 {
panic(errno)
}
return reflect.SliceHeader{
Data: data,
Len: len,
Cap: len,
}
}Running this program yields GC pauses comparable to the pointer‑free case (≈150‑460 µs). However, pointers stored in this hidden memory are not tracked by the GC, which can lead to premature reclamation of heap objects.
The essence of the problem
Pointers are the root cause of GC overhead. If we can avoid pointers in large allocations, the GC incurs little cost without resorting to tricks. When using non‑heap memory, we must also avoid storing pointers to heap objects, otherwise the GC cannot see them.
How to avoid pointers
Common pointer‑heavy structures include:
Large numbers of strings.
time.Time values.
Slices stored inside maps.
Maps with string keys.
One concrete technique is to replace a slice of strings with a single byte slice that holds all string data and a parallel slice of offsets. This eliminates the per‑string StringHeader pointers that the GC would otherwise scan.
type StringHeader struct {
Data uintptr
Len int
}Similarly, a SliceHeader describes the underlying array:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}By storing 10⁸ strings as a contiguous byte buffer and an offset array, the GC only scans the offset slice (which contains integers) and ignores the actual string bytes, dramatically reducing pause times.
package main
import (
"fmt"
"runtime"
"strconv"
"time"
"unsafe"
)
func main() {
var stringBytes []byte
var stringOffsets []int
for i := 0; i < 1e8; i++ {
val := strconv.Itoa(i)
stringBytes = append(stringBytes, val...)
stringOffsets = append(stringOffsets, len(stringBytes))
}
runtime.GC()
start := time.Now()
runtime.GC()
fmt.Printf("GC took %s
", time.Since(start))
sStart := 0
for i := 0; i < 10; i++ {
sEnd := stringOffsets[i]
bytes := stringBytes[sStart:sEnd]
stringVal := *(*string)(unsafe.Pointer(&bytes))
fmt.Println(stringVal)
sStart = sEnd
}
}The GC pause for this approach is only ~187 µs, and the first ten strings are printed correctly.
Further resources
String store techniques.
Go string interning library.
Converting string IDs to integer IDs for fast comparison.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
