Fundamentals 30 min read

Golang Garbage Collection: Algorithms, Memory Management, and Evolution

The article details Go’s runtime memory architecture—including pages, spans, mcache, and size classes—and explains its non‑generational concurrent tri‑color mark‑and‑sweep garbage collector, the required write‑barrier techniques, and the collector’s evolution, phases, trigger mechanisms, and practical code examples across Go 1.0‑1.16.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Golang Garbage Collection: Algorithms, Memory Management, and Evolution

Modern high‑level languages manage memory automatically. Go (since v1.12) uses a non‑generational, concurrent, tri‑color mark‑and‑sweep garbage collector (GC). The article explains Go's runtime memory layout, allocation structures, GC algorithms, write‑barrier techniques, and the evolution of the collector across Go versions.

1. Go runtime basics

Go schedules work using three entities: G (goroutine), M (OS thread), and P (processor). A goroutine runs only when a G, an M, and a P are combined.

2. Memory management structures (TCMalloc‑inspired)

Page – 8 KB on x64, the basic unit of memory.

Span – a contiguous set of pages; represented by mspan in the runtime.

mcache – per‑P cache for small objects (≤ 32 KB), providing lock‑free allocation.

mcentral – shared cache (similar to TCMalloc’s CentralCache) that supplies spans to mcache.

mheap – heap manager (analogous to PageHeap) that obtains pages from the OS and organizes them into spans.

Stack – each goroutine has its own stack storing locals, arguments, and pointers.

3. Object size classes

Tiny objects: size < 16 B, no pointers, allocated from mcache.

Small objects: 16 B – 32 KB, allocated from the appropriate mspan class.

Large objects: > 32 KB, allocated directly from mheap.

The core idea is multi‑level management to reduce lock contention and fragmentation.

4. Allocation flow

If mcache lacks a free span, it requests one from mcentral; if mcentral is empty, it asks mheap; if mheap lacks pages, it requests new pages (≥ 1 MB) from the OS.

5. Garbage‑collection algorithm

Go uses a tri‑color mark‑and‑sweep algorithm. Objects are colored white (unvisited), gray (reachable but not fully scanned), or black (fully scanned). The collector performs a stop‑the‑world (STW) pause to mark root objects, then proceeds concurrently:

Mark phase: roots are scanned, gray objects are processed, turning them black and turning reachable objects gray.

Mark termination: when the gray set is empty, the collector stops marking.

Sweep phase: unmarked (white) objects are reclaimed.

Because the mutator runs concurrently, write barriers are required to maintain the tri‑color invariant.

6. Write‑barrier techniques

Insert write barrier (Dijkstra, 1978) – shades the newly stored pointer gray before the store:

func DijkstraWritePointer(slot *unsafe.Pointer, ptr unsafe.Pointer) {
    shade(ptr) // mark new downstream object gray
    *slot = ptr
}
// usage examples
A.AddDownstream(nil, B) // B becomes gray
A.AddDownstream(C, B)   // B becomes gray again

Delete write barrier (Yuasa, 1990) – shades the old pointer gray when a reference is removed:

func YuasaWritePointer(slot *unsafe.Pointer, ptr unsafe.Pointer) {
    shade(*slot) // old object becomes gray
    *slot = ptr
}
// usage examples
A.AddDownstream(B, nil) // B becomes gray if it was white
A.AddDownstream(B, C)   // B becomes gray again

Hybrid write barrier (Go 1.8) – combines insert and delete barriers to avoid full stack rescans while keeping the tri‑color invariant:

writePointer(slot, ptr) {
    shade(*slot)
    if currentStackIsGrey {
        shade(ptr)
    }
    *slot = ptr
}

These barriers ensure that newly reachable objects are marked gray and that objects whose last reference is removed are also marked gray, preventing both "floating" garbage and premature reclamation (dangling pointers).

7. Evolution of Go GC (v1.0 → v1.16)

v1.0 – fully stop‑the‑world, serial mark‑and‑sweep.

v1.1 – parallel marking/cleaning on multi‑core.

v1.3 – precise stack scanning.

v1.5 – concurrent tri‑color mark‑sweep, latency < 10 ms.

v1.6 – decentralized GC coordinator, bitmap‑based heap.

v1.7 – parallel stack shrinking, GC pause < 2 ms.

v1.8 – hybrid write barrier, pause < 0.5 ms.

v1.9 – removed full stack rescans.

v1.10 – improved pacing controller.

v1.12 – new mark‑termination algorithm.

v1.13 – Scavenger returns memory to OS.

v1.14 – new page allocator for faster allocation.

v1.15 – compiler changes to reduce write‑barrier overhead.

v1.16 – aggressive MADV_DONTNEED to release unused memory.

8. GC phases in the source (runtime/mgc.go)

Sweep termination – STW, safe‑point entry, clean remaining spans.

Mark phase – switch gcphase to _GCmark, enable write barriers, start concurrent workers, scan roots, process gray queue, use distributed termination detection.

Mark termination – STW, switch to _GCmarktermination, stop workers, perform housekeeping.

Sweep phase – switch to _GCoff, resume program, new allocations start as white, background sweeps free spans.

9. GC trigger conditions

The runtime decides to start a GC via runtime.gcTrigger.test based on three kinds of triggers:

//gcTrigger.test implementation (simplified)
func (t gcTrigger) test() bool {
    if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
        return false
    }
    switch t.kind {
    case gcTriggerHeap:
        return gcController.heapLive >= gcController.trigger
    case gcTriggerTime:
        lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
        return lastgc != 0 && t.now-lastgc > forcegcperiod
    case gcTriggerCycle:
        return int32(t.n-work.cycles) > 0
    }
    return true
}

Triggers are invoked from:

runtime.mallocgc – allocation‑driven heap trigger.

runtime.GC – explicit user request.

runtime.forcegchelper – periodic background trigger.

10. Example code snippets

GC‑finished callback:

func gcfinished() *int {
    p := 1
    runtime.SetFinalizer(&p, func(_ *int) { println("gc finished") })
    return &p
}

Allocation loop used for tracing:

func allocate() {
    _ = make([]byte, int((1<<20)*0.25))
}

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()
    gcfinished()
    for n := 1; n < 50; n++ {
        println("#allocate: ", n)
        allocate()
    }
    println("terminate")
}

GC trigger test in mallocgc :

if shouldhelpgc {
    if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
        gcStart(t)
    }
}

Manual GC invocation:

func GC() {
    n := atomic.Load(&work.cycles)
    gcWaitOnMark(n)
    gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
    gcWaitOnMark(n + 1)
    for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
        sweep.nbgsweep++
        Gosched()
    }
    for atomic.Load(&work.cycles) == n+1 && !isSweepDone() {
        Gosched()
    }
    // post‑sweep bookkeeping omitted for brevity
}

These snippets illustrate how the collector is wired into allocation, how write barriers are implemented, and how GC cycles are started and completed.

11. References

Go Language Design and Implementation (Draveness)

Expert comparison of Go and Java GC algorithms

Go runtime source (runtime/mgc.go)

CMS garbage collector article

Go 1.16 source code repository

Memory ManagementGoRuntimeGarbage CollectionConcurrent MarkingWrite Barrier
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.