How Go Manages Thread and Goroutine Stacks: A Deep Dive into Memory Allocation
This article explains how Linux processes and glibc threads allocate stacks, then details Go's runtime mechanisms for creating thread stacks and lightweight goroutine stacks, including guard‑based detection and automatic stack growth, with code examples and diagrams.
1. Process Stack & glibc Thread Stack
In the Linux kernel a process is represented by a task_struct whose memory‑related structures reside in a mm_struct. The address space is stored in a red‑black tree, each node describing a contiguous region. When a process starts, exec allocates a 4 KB initial stack; the stack expands automatically as needed, with limits viewable and adjustable via ulimit -s. When pthread_create is called, each thread receives an independent stack allocated with mmap, because the default process stack is shared among all task_struct instances. glibc implements threads as NPTL, which consists of a user‑space struct pthread (including its own stack) and a kernel‑space task_struct.
When a new thread is created via pthread_create , the runtime allocates a separate stack in user space and registers it with the kernel through clone .
2. Go Thread Stack and Goroutine Stack
Go distinguishes OS threads and lightweight goroutines; each thread can run many goroutines. Both thread stacks and goroutine stacks are manually allocated by the runtime.
2.1 Thread‑stack allocation
Go creates a thread similarly to glibc: it first allocates a thread object and then invokes the clone system call. The core creation function is newm (runtime/proc.go):
func newm(fn func(), _ *p, id int64) {
// allocate thread object and default g0
mp := allocm(_p_, fn, id)
...
// actually create the thread via clone
newm1(mp)
}allocm allocates a m (thread) structure and creates a special goroutine g0 :
func allocm(_p_ *p, fn func(), id int64) *m {
// allocate thread object
mp := new(m)
mp.mstartfn = fn
if iscgo || mStackIsSystemAllocated() {
mp.g0 = malg(-1)
} else {
mp.g0 = malg(8192 * sys.StackGuardMultiplier)
}
mp.g0.m = mp
return mp
}The malg function creates a g (goroutine) object and allocates its stack with stackalloc . The default stack size for g0 is 8 KB, while ordinary goroutine stacks start at 2 KB.
func malg(stacksize int32) *g {
// allocate goroutine object
newg := new(g)
if stacksize >= 0 {
...
systemstack(func() {
newg.stack = stackalloc(uint32(stacksize))
})
newg.stackguard0 = newg.stack.lo + _StackGuard
newg.stackguard1 = ^uintptr(0)
}
return newg
}After the g0 stack is prepared, Go passes its top address to the kernel in the clone call performed by newosproc :
const (
cloneFlags = _CLONE_VM | _CLONE_FS | _CLONE_FILES |
_CLONE_SIGHAND | _CLONE_SYSVSEM | _CLONE_THREAD
)
func newosproc(mp *m) {
// use g0's stack as the thread stack
stk := unsafe.Pointer(mp.g0.stack.hi)
...
ret := clone(cloneFlags, stk, ..., unsafe.Pointer(abi.FuncPCABI0(mstart)))
}2.2 Goroutine‑stack allocation
The primary goroutine is created in the runtime assembly entry rt0_go , which calls runtime.newproc . newproc1 obtains a g from the cache or creates a new one with malg(_StackMin) (2 KB):
func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
newg := gfget(_p_)
if newg == nil {
newg = malg(_StackMin)
...
}
newg.startpc = fn.fn
...
return newg
}3. Goroutine Stack Expansion
3.1 Detecting the need to grow
Each goroutine stores stackguard0 = stack.lo + StackGuard (≈928 bytes on Linux) and stackguard1 . The SP register always points to the current stack top; when SP moves below stackguard0 the runtime decides the stack must grow.
stackguard0: stack.lo + StackGuard, used for overflow detection
StackGuard: constant size, 928 bytes on Linux
The compiler inserts assembly that compares SP with stackguard0 . If the guard is crossed, execution jumps to runtime.morestack_noctxt :
# GOOS=linux GOARCH=amd64 go tool compile -S -N -l main.go > main.s
"".func1 STEXT size=143 args=0x8 locals=0xd8 funcid=0x0 align=0x0
0x0000 00000 (main.go:9) LEAQ -88(SP), R12
0x0005 00005 (main.go:9) CMPQ R12, 16(R14) // compare SP with stackguard0
0x0009 00009 (main.go:9) PCDATA $0, $-2
0x0009 00009 (main.go:9) JLS 120 // jump if SP < guard
...
0x0078 00120 (main.go:11) CALL runtime.morestack_noctxt(SB)3.2 Performing the growth
If growth is required, runtime.morestack_noctxt eventually calls runtime.newstack , which doubles the current stack size, allocates a new region with stackalloc , copies the old contents, updates the goroutine's stack pointers, and frees the old region.
func newstack() {
// new stack is twice the old size
oldsize := gp.stack.hi - gp.stack.lo
newsize := oldsize * 2
...
// allocate new stack and copy old data
copystack(gp, newsize)
...
}
func copystack(gp *g, newsize uintptr) {
// allocate new stack
new := stackalloc(uint32(newsize))
// copy old stack contents
memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
// switch to new stack
gp.stack = new
gp.stackguard0 = new.lo + _StackGuard
// free old stack
stackfree(old)
}The stackalloc routine maintains a private pool of memory blocks; only when the pool is exhausted does it fall back to mmap to request more memory from the OS. A complementary shrink‑stack mechanism ( shrinkstack ) also exists but is not detailed here.
4. Summary
Linux allocates a 4 KB stack for a process at exec ; threads are created in user space and receive independent stacks via mmap . Go follows the same model: each OS thread carries a special g0 goroutine whose stack is handed to the kernel via clone . Ordinary goroutines start with a tiny 2 KB stack that grows on demand by doubling, driven by guard checks inserted by the compiler. This design enables Go to support massive concurrency with low memory overhead.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
