Understanding Go's Memory Allocation: From Assembly Debugging to Runtime Components
The article walks through Go’s memory allocator by first demonstrating assembly‑level debugging with Delve, then detailing its TCMalloc‑inspired design where tiny, small, and large objects follow distinct paths through per‑P caches, central spans, and the global heap, highlighting the roles of mcache, mcentral, mspan, and mheap.
This article explains the implementation of Go's memory allocator by walking through assembly‑level debugging and analyzing the source code of Go 1.15.7. It starts with an overview of the allocator’s design, which borrows concepts from TCMalloc, and then details how objects are classified and allocated based on size.
1. Allocation strategy
Objects ≤ 32 KB are cached per‑thread in a lock‑free small‑object cache; larger objects are allocated directly from the page heap. The article links to the original TCMalloc documentation for reference.
2. Debugging Go assembly
Go supports GDB, LLDB, and the Go‑specific debugger Delve. The following command installs Delve:
go get github.com/go-delve/delve/cmd/dlvA simple test.go program is used for demonstration:
package main
import "fmt"
type A struct {
test string
}
func main() {
a := new(A)
fmt.Println(a)
}Running the debugger:
dlv debugSetting a breakpoint on main.main :
(dlv) break main.main
Breakpoint 1 set at 0x4bd30a for main.main() c:/document/code/test_go/src/test.go:8Listing breakpoints:
(dlv) breakpointsContinuing execution:
(dlv) continueDisassembling the main.main function:
(dlv) disassemble
TEXT main.main(SB) C:/document/code/test_go/src/test.go:8
0x4bd2f0 65488b0c2528000000 mov rcx, qword ptr gs:[0x28]
0x4bd2f9 488b8900000000 mov rcx, qword ptr [rcx]
...
=>0x4bd30a 4883ec78 sub rsp, 0x783. Runtime components
The allocator consists of four main structures:
runtime.mspan – the smallest unit that manages a contiguous range of pages.
runtime.mcache – per‑P (processor) cache for tiny and small objects.
runtime.mcentral – central list that supplies spans to caches when they run out.
runtime.mheap – the global heap that allocates spans from the operating system.
Key definitions (excerpt):
type mspan struct {
next *mspan
prev *mspan
list *mSpanList
startAddr uintptr
npages uintptr
freeindex uintptr
nelems uintptr
allocCache uint64
elemsize uintptr
limit uintptr
...
}Allocation of large objects (> 32 KB) uses largeAlloc which calculates the required number of pages and calls mheap.alloc :
func largeAlloc(size uintptr, needzero, noscan bool) *mspan {
npages := size >> _PageShift
if size & _PageMask != 0 { npages++ }
s := mheap_.alloc(npages, makeSpanClass(0, noscan), needzero)
if s == nil { throw("out of memory") }
return s
}Small objects (16 B – 32 KB) are allocated via mallocgc which first determines a size class, then obtains a span from the cache:
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
c := gomcache()
var x unsafe.Pointer
noscan := typ == nil || typ.ptrdata == 0
if size <= maxSmallSize {
var sizeclass uint8
if size <= smallSizeMax-8 {
sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
} else {
sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
}
size = uintptr(class_to_size[sizeclass])
spc := makeSpanClass(sizeclass, noscan)
span := c.alloc[spc]
v := nextFreeFast(span)
if v == 0 {
v, span, _ = c.nextFree(spc)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
memclrNoHeapPointers(unsafe.Pointer(v), size)
}
}
...
return x
}For tiny objects (≤ 16 B) that contain no pointers, the allocator uses a fast path that packs allocations into a tiny buffer:
if noscan && size < maxTinySize {
off := c.tinyoffset
if size&7 == 0 { off = alignUp(off, 8) }
else if size&3 == 0 { off = alignUp(off, 4) }
else if size&1 == 0 { off = alignUp(off, 2) }
if off+size <= maxTinySize && c.tiny != 0 {
x = unsafe.Pointer(c.tiny + off)
c.tinyoffset = off + size
c.local_tinyallocs++
return x
}
// fallback to a span of class tinySpanClass
span := c.alloc[tinySpanClass]
v := nextFreeFast(span)
if v == 0 { v, _, _ = c.nextFree(tinySpanClass) }
x = unsafe.Pointer(v)
// zero the 16‑byte block
(*[2]uint64)(x)[0] = 0
(*[2]uint64)(x)[1] = 0
if size < c.tinyoffset || c.tiny == 0 {
c.tiny = uintptr(x)
c.tinyoffset = size
}
return x
}4. Summary
The article demonstrates how to debug Go assembly with Delve and then walks through the three allocation paths (large, small, tiny). Small objects are served from a lock‑free per‑P cache; when the cache is exhausted, spans are fetched from runtime.mcentral , which in turn may allocate new spans from runtime.mheap . Large objects bypass the cache and are allocated directly from the heap, optionally using a page‑cache for modest sizes. This layered design provides high‑performance, low‑contention memory allocation for Go programs.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.