Backend Development 28 min read

Understanding Go's Memory Allocation: From Assembly Debugging to Runtime Components

The article walks through Go’s memory allocator by first demonstrating assembly‑level debugging with Delve, then detailing its TCMalloc‑inspired design where tiny, small, and large objects follow distinct paths through per‑P caches, central spans, and the global heap, highlighting the roles of mcache, mcentral, mspan, and mheap.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Understanding Go's Memory Allocation: From Assembly Debugging to Runtime Components

This article explains the implementation of Go's memory allocator by walking through assembly‑level debugging and analyzing the source code of Go 1.15.7. It starts with an overview of the allocator’s design, which borrows concepts from TCMalloc, and then details how objects are classified and allocated based on size.

1. Allocation strategy

Objects ≤ 32 KB are cached per‑thread in a lock‑free small‑object cache; larger objects are allocated directly from the page heap. The article links to the original TCMalloc documentation for reference.

2. Debugging Go assembly

Go supports GDB, LLDB, and the Go‑specific debugger Delve. The following command installs Delve:

go get github.com/go-delve/delve/cmd/dlv

A simple test.go program is used for demonstration:

package main

import "fmt"

type A struct {
    test string
}

func main() {
    a := new(A)
    fmt.Println(a)
}

Running the debugger:

dlv debug

Setting a breakpoint on main.main :

(dlv) break main.main
Breakpoint 1 set at 0x4bd30a for main.main() c:/document/code/test_go/src/test.go:8

Listing breakpoints:

(dlv) breakpoints

Continuing execution:

(dlv) continue

Disassembling the main.main function:

(dlv) disassemble
TEXT main.main(SB) C:/document/code/test_go/src/test.go:8
    0x4bd2f0    65488b0c2528000000      mov rcx, qword ptr gs:[0x28]
    0x4bd2f9    488b8900000000          mov rcx, qword ptr [rcx]
    ...
    =>0x4bd30a    4883ec78                sub rsp, 0x78

3. Runtime components

The allocator consists of four main structures:

runtime.mspan – the smallest unit that manages a contiguous range of pages.

runtime.mcache – per‑P (processor) cache for tiny and small objects.

runtime.mcentral – central list that supplies spans to caches when they run out.

runtime.mheap – the global heap that allocates spans from the operating system.

Key definitions (excerpt):

type mspan struct {
    next *mspan
    prev *mspan
    list *mSpanList
    startAddr uintptr
    npages    uintptr
    freeindex uintptr
    nelems    uintptr
    allocCache uint64
    elemsize  uintptr
    limit     uintptr
    ...
}

Allocation of large objects (> 32 KB) uses largeAlloc which calculates the required number of pages and calls mheap.alloc :

func largeAlloc(size uintptr, needzero, noscan bool) *mspan {
    npages := size >> _PageShift
    if size & _PageMask != 0 { npages++ }
    s := mheap_.alloc(npages, makeSpanClass(0, noscan), needzero)
    if s == nil { throw("out of memory") }
    return s
}

Small objects (16 B – 32 KB) are allocated via mallocgc which first determines a size class, then obtains a span from the cache:

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    c := gomcache()
    var x unsafe.Pointer
    noscan := typ == nil || typ.ptrdata == 0
    if size <= maxSmallSize {
        var sizeclass uint8
        if size <= smallSizeMax-8 {
            sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
        } else {
            sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
        }
        size = uintptr(class_to_size[sizeclass])
        spc := makeSpanClass(sizeclass, noscan)
        span := c.alloc[spc]
        v := nextFreeFast(span)
        if v == 0 {
            v, span, _ = c.nextFree(spc)
        }
        x = unsafe.Pointer(v)
        if needzero && span.needzero != 0 {
            memclrNoHeapPointers(unsafe.Pointer(v), size)
        }
    }
    ...
    return x
}

For tiny objects (≤ 16 B) that contain no pointers, the allocator uses a fast path that packs allocations into a tiny buffer:

if noscan && size < maxTinySize {
    off := c.tinyoffset
    if size&7 == 0 { off = alignUp(off, 8) }
    else if size&3 == 0 { off = alignUp(off, 4) }
    else if size&1 == 0 { off = alignUp(off, 2) }
    if off+size <= maxTinySize && c.tiny != 0 {
        x = unsafe.Pointer(c.tiny + off)
        c.tinyoffset = off + size
        c.local_tinyallocs++
        return x
    }
    // fallback to a span of class tinySpanClass
    span := c.alloc[tinySpanClass]
    v := nextFreeFast(span)
    if v == 0 { v, _, _ = c.nextFree(tinySpanClass) }
    x = unsafe.Pointer(v)
    // zero the 16‑byte block
    (*[2]uint64)(x)[0] = 0
    (*[2]uint64)(x)[1] = 0
    if size < c.tinyoffset || c.tiny == 0 {
        c.tiny = uintptr(x)
        c.tinyoffset = size
    }
    return x
}

4. Summary

The article demonstrates how to debug Go assembly with Delve and then walks through the three allocation paths (large, small, tiny). Small objects are served from a lock‑free per‑P cache; when the cache is exhausted, spans are fetched from runtime.mcentral , which in turn may allocate new spans from runtime.mheap . Large objects bypass the cache and are allocated directly from the heap, optionally using a page‑cache for modest sizes. This layered design provides high‑performance, low‑contention memory allocation for Go programs.

debuggingGoRuntimeassemblyGCmemory allocation
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.