Mastering Go Garbage Collection: Tips to Slash Latency and Boost Performance
This article explains Go's memory management mechanisms—including GC fundamentals, stop‑the‑world marking, tri‑color marking with write barriers, hybrid write barriers, and practical optimization techniques such as reducing heap allocations, using caches, concurrency patterns, and profiling tools—to help developers identify and eliminate performance bottlenecks.
GC Principles
This section introduces Go's memory management (GC) and explains why it is crucial for performance optimization, common optimization techniques, and how newer Go versions improve GC.
Mark-and-Sweep (Stop-the-World)
STW pauses all goroutines, marks reachable objects, and then clears unreachable ones. Before Go 1.3 the GC relied entirely on global STW, causing pauses of hundreds of milliseconds.
Tri‑color Marking + Write Barriers
Tri‑color Marking
Initially all objects are white.
First scan moves reachable objects from white to gray.
Repeated scans move references from gray to gray and then gray to black, finally moving gray objects to black.
When the gray set is empty, only white objects remain and can be reclaimed.
The method is called “tri‑color concurrent marking” because it records intermediate states, but concurrent mutations can cause read‑write conflicts that may incorrectly delete white objects. Protecting white objects (e.g., delaying deletion) avoids this issue.
Necessary Conditions that Make Tri‑color Marking Unsafe
White objects are referenced by black objects.
White objects lose all reachable paths from gray objects.
Breaking either condition prevents accidental deletion; subsequent optimizations revolve around this rule.
Add Write Barriers to Protect White Nodes
Insertion barrier: when node B is attached under node A, B is marked gray, preventing white nodes from being linked under black nodes.
Deletion barrier: when a white object is deleted, it is marked gray, ensuring it still has a gray reference.
Go 1.5 adopted tri‑color marking with write barriers, allowing concurrent scanning without full pauses, though stack write barriers still require occasional STW pauses of 10‑100 ms.
Hybrid Write Barriers
Go 1.8 introduced hybrid write barriers, eliminating repeated stack scans and greatly reducing STW time. Two new actions:
At GC start, all reachable stack objects are marked black.
During GC, newly created stack objects are initialized as black.
This ensures stack objects are protected and no white objects become reachable from black nodes.
GC Optimizations
GC Bottleneck Analysis
Root Cause in GC Scanning
Even though STW pauses are minimized, the main performance cost lies in the CPU‑intensive scanning phase of GC.
Scanning triggers:
Heap reaches threshold (default GOGC=100, i.e., memory doubles).
Timed trigger if no GC occurs for 2 minutes.
Manual trigger via runtime.GC().
Allocation of large objects (>32 KB) or cache exhaustion.
Memory reclamation occurs immediately after marking or during allocation‑assisted reclamation.
How to Locate GC Issues
Use flame graphs to spot high CPU usage in gcBgMarkWorker; note that mallocgc indicates allocation bottlenecks, not GC scanning.
Reduce Heap Object Allocation
Allocate structs instead of pointers for small objects to keep them on the stack:
func createUser() *User { return &User{ID: 1, Name: "Alice"} // escapes to heapRewrite as:
func createUser() User { return User{ID: 1, Name: "Alice"} // stack allocationPass parameters instead of capturing closures, use sync.Pool for reusable buffers, and pre‑allocate slices to avoid repeated growth.
var pool = sync.Pool{New: func() interface{} { return make([]byte, 1024) }}Adjust GOGC to control GC frequency:
import "runtime/debug"
func main() { debug.SetGCPercent(200) // trigger GC at 200% growth }Allocate a large ballast to raise the heap threshold:
ballast := make([]byte, 10<<30) // 10 GB virtual memory
runtime.KeepAlive(ballast)Pre‑allocate slice capacity to avoid multiple reallocations:
data := make([]int, 0, 1000) // allocate onceCache Usage
Design caches to keep average response time under 100 ms; avoid over‑caching that causes periodic CPU spikes or OOM due to long cleanup intervals.
Concurrency Best Practices
Asynchronously handle non‑critical paths, minimize lock contention, use read‑write or optimistic locks, replace locks with atomic operations, and limit goroutine count with pools to prevent OOM.
var counter int32 = 0
for i := 0; i < 10; i++ { go func(){ atomic.AddInt32(&counter, 1) }() }High‑Performance Coding Habits
Avoid large logs; excessive logging can dominate CPU.
Prefer strings.Builder or strings.Join over string concatenation with + or fmt.Sprintf.
Avoid deep copies; use structs instead of pointers when possible.
Prefer generics over reflection for type‑safe, zero‑allocation code.
Profiling Tools
pprof
Use flame graphs to identify hot functions such as gcBgMarkWorker, serialization, or Sprintf.
trace
Trace provides a timeline of events, useful for diagnosing deadlocks or long pauses.
func main() {
f, _ := os.Create("deadlock_trace.out")
trace.Start(f)
// ... create deadlock ...
trace.Stop()
}Requirement Phase
Ensure Stable Experience
Provide fallback content for missing data or errors.
Coordinate sorting strategies to avoid random content changes.
Maintain data source consistency across pages.
Maintain Smooth UI
Paginate or split large data sets.
Lazy‑load non‑critical content.
Control asset sizes (e.g., use 120 px images instead of 1024 px).
Classify data freshness to reduce DB pressure.
Rate‑limit user actions to lower request volume.
Cost Communication
Discuss image generation costs with product owners.
Limit response payload size to improve network and frontend performance.
Tencent Technical Engineering
Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
