Mastering Go Garbage Collection: Tips to Slash Latency and Boost Performance

This article explains Go's memory management mechanisms—including GC fundamentals, stop‑the‑world marking, tri‑color marking with write barriers, hybrid write barriers, and practical optimization techniques such as reducing heap allocations, using caches, concurrency patterns, and profiling tools—to help developers identify and eliminate performance bottlenecks.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
Mastering Go Garbage Collection: Tips to Slash Latency and Boost Performance

GC Principles

This section introduces Go's memory management (GC) and explains why it is crucial for performance optimization, common optimization techniques, and how newer Go versions improve GC.

Mark-and-Sweep (Stop-the-World)

STW pauses all goroutines, marks reachable objects, and then clears unreachable ones. Before Go 1.3 the GC relied entirely on global STW, causing pauses of hundreds of milliseconds.

Tri‑color Marking + Write Barriers

Tri‑color Marking

Initially all objects are white.

First scan moves reachable objects from white to gray.

Repeated scans move references from gray to gray and then gray to black, finally moving gray objects to black.

When the gray set is empty, only white objects remain and can be reclaimed.

The method is called “tri‑color concurrent marking” because it records intermediate states, but concurrent mutations can cause read‑write conflicts that may incorrectly delete white objects. Protecting white objects (e.g., delaying deletion) avoids this issue.

Necessary Conditions that Make Tri‑color Marking Unsafe

White objects are referenced by black objects.

White objects lose all reachable paths from gray objects.

Breaking either condition prevents accidental deletion; subsequent optimizations revolve around this rule.

Add Write Barriers to Protect White Nodes

Insertion barrier: when node B is attached under node A, B is marked gray, preventing white nodes from being linked under black nodes.

Deletion barrier: when a white object is deleted, it is marked gray, ensuring it still has a gray reference.

Go 1.5 adopted tri‑color marking with write barriers, allowing concurrent scanning without full pauses, though stack write barriers still require occasional STW pauses of 10‑100 ms.

Hybrid Write Barriers

Go 1.8 introduced hybrid write barriers, eliminating repeated stack scans and greatly reducing STW time. Two new actions:

At GC start, all reachable stack objects are marked black.

During GC, newly created stack objects are initialized as black.

This ensures stack objects are protected and no white objects become reachable from black nodes.

GC Optimizations

GC Bottleneck Analysis

Root Cause in GC Scanning

Even though STW pauses are minimized, the main performance cost lies in the CPU‑intensive scanning phase of GC.

Scanning triggers:

Heap reaches threshold (default GOGC=100, i.e., memory doubles).

Timed trigger if no GC occurs for 2 minutes.

Manual trigger via runtime.GC().

Allocation of large objects (>32 KB) or cache exhaustion.

Memory reclamation occurs immediately after marking or during allocation‑assisted reclamation.

How to Locate GC Issues

Use flame graphs to spot high CPU usage in gcBgMarkWorker; note that mallocgc indicates allocation bottlenecks, not GC scanning.

Reduce Heap Object Allocation

Allocate structs instead of pointers for small objects to keep them on the stack:

func createUser() *User { return &User{ID: 1, Name: "Alice"} // escapes to heap

Rewrite as:

func createUser() User { return User{ID: 1, Name: "Alice"} // stack allocation

Pass parameters instead of capturing closures, use sync.Pool for reusable buffers, and pre‑allocate slices to avoid repeated growth.

var pool = sync.Pool{New: func() interface{} { return make([]byte, 1024) }}

Adjust GOGC to control GC frequency:

import "runtime/debug"
func main() { debug.SetGCPercent(200) // trigger GC at 200% growth }

Allocate a large ballast to raise the heap threshold:

ballast := make([]byte, 10<<30) // 10 GB virtual memory
runtime.KeepAlive(ballast)

Pre‑allocate slice capacity to avoid multiple reallocations:

data := make([]int, 0, 1000) // allocate once

Cache Usage

Design caches to keep average response time under 100 ms; avoid over‑caching that causes periodic CPU spikes or OOM due to long cleanup intervals.

Concurrency Best Practices

Asynchronously handle non‑critical paths, minimize lock contention, use read‑write or optimistic locks, replace locks with atomic operations, and limit goroutine count with pools to prevent OOM.

var counter int32 = 0
for i := 0; i < 10; i++ { go func(){ atomic.AddInt32(&counter, 1) }() }

High‑Performance Coding Habits

Avoid large logs; excessive logging can dominate CPU.

Prefer strings.Builder or strings.Join over string concatenation with + or fmt.Sprintf.

Avoid deep copies; use structs instead of pointers when possible.

Prefer generics over reflection for type‑safe, zero‑allocation code.

Profiling Tools

pprof

Use flame graphs to identify hot functions such as gcBgMarkWorker, serialization, or Sprintf.

trace

Trace provides a timeline of events, useful for diagnosing deadlocks or long pauses.

func main() {
    f, _ := os.Create("deadlock_trace.out")
    trace.Start(f)
    // ... create deadlock ...
    trace.Stop()
}

Requirement Phase

Ensure Stable Experience

Provide fallback content for missing data or errors.

Coordinate sorting strategies to avoid random content changes.

Maintain data source consistency across pages.

Maintain Smooth UI

Paginate or split large data sets.

Lazy‑load non‑critical content.

Control asset sizes (e.g., use 120 px images instead of 1024 px).

Classify data freshness to reduce DB pressure.

Rate‑limit user actions to lower request volume.

Cost Communication

Discuss image generation costs with product owners.

Limit response payload size to improve network and frontend performance.

Performance optimizationmemory managementconcurrencyGoGarbage Collectionprofiling
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.