Fundamentals 16 min read

How Go Manages Thread and Goroutine Stacks: A Deep Dive into Memory Allocation

This article explains how Linux processes and glibc threads allocate stacks, then details Go's runtime mechanisms for creating thread stacks and lightweight goroutine stacks, including guard‑based detection and automatic stack growth, with code examples and diagrams.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
How Go Manages Thread and Goroutine Stacks: A Deep Dive into Memory Allocation

1. Process Stack & glibc Thread Stack

In the Linux kernel a process is represented by a task_struct whose memory‑related structures reside in a mm_struct. The address space is stored in a red‑black tree, each node describing a contiguous region. When a process starts, exec allocates a 4 KB initial stack; the stack expands automatically as needed, with limits viewable and adjustable via ulimit -s. When pthread_create is called, each thread receives an independent stack allocated with mmap, because the default process stack is shared among all task_struct instances. glibc implements threads as NPTL, which consists of a user‑space struct pthread (including its own stack) and a kernel‑space task_struct.

When a new thread is created via pthread_create , the runtime allocates a separate stack in user space and registers it with the kernel through clone .

2. Go Thread Stack and Goroutine Stack

Go distinguishes OS threads and lightweight goroutines; each thread can run many goroutines. Both thread stacks and goroutine stacks are manually allocated by the runtime.

2.1 Thread‑stack allocation

Go creates a thread similarly to glibc: it first allocates a thread object and then invokes the clone system call. The core creation function is newm (runtime/proc.go):

func newm(fn func(), _ *p, id int64) {
    // allocate thread object and default g0
    mp := allocm(_p_, fn, id)
    ...
    // actually create the thread via clone
    newm1(mp)
}

allocm allocates a m (thread) structure and creates a special goroutine g0 :

func allocm(_p_ *p, fn func(), id int64) *m {
    // allocate thread object
    mp := new(m)
    mp.mstartfn = fn
    if iscgo || mStackIsSystemAllocated() {
        mp.g0 = malg(-1)
    } else {
        mp.g0 = malg(8192 * sys.StackGuardMultiplier)
    }
    mp.g0.m = mp
    return mp
}

The malg function creates a g (goroutine) object and allocates its stack with stackalloc . The default stack size for g0 is 8 KB, while ordinary goroutine stacks start at 2 KB.

func malg(stacksize int32) *g {
    // allocate goroutine object
    newg := new(g)
    if stacksize >= 0 {
        ...
        systemstack(func() {
            newg.stack = stackalloc(uint32(stacksize))
        })
        newg.stackguard0 = newg.stack.lo + _StackGuard
        newg.stackguard1 = ^uintptr(0)
    }
    return newg
}

After the g0 stack is prepared, Go passes its top address to the kernel in the clone call performed by newosproc :

const (
    cloneFlags = _CLONE_VM | _CLONE_FS | _CLONE_FILES |
        _CLONE_SIGHAND | _CLONE_SYSVSEM | _CLONE_THREAD
)

func newosproc(mp *m) {
    // use g0's stack as the thread stack
    stk := unsafe.Pointer(mp.g0.stack.hi)
    ...
    ret := clone(cloneFlags, stk, ..., unsafe.Pointer(abi.FuncPCABI0(mstart)))
}

2.2 Goroutine‑stack allocation

The primary goroutine is created in the runtime assembly entry rt0_go , which calls runtime.newproc . newproc1 obtains a g from the cache or creates a new one with malg(_StackMin) (2 KB):

func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
    newg := gfget(_p_)
    if newg == nil {
        newg = malg(_StackMin)
        ...
    }
    newg.startpc = fn.fn
    ...
    return newg
}

3. Goroutine Stack Expansion

3.1 Detecting the need to grow

Each goroutine stores stackguard0 = stack.lo + StackGuard (≈928 bytes on Linux) and stackguard1 . The SP register always points to the current stack top; when SP moves below stackguard0 the runtime decides the stack must grow.

stackguard0: stack.lo + StackGuard, used for overflow detection

StackGuard: constant size, 928 bytes on Linux

The compiler inserts assembly that compares SP with stackguard0 . If the guard is crossed, execution jumps to runtime.morestack_noctxt :

# GOOS=linux GOARCH=amd64 go tool compile -S -N -l main.go > main.s
"".func1 STEXT size=143 args=0x8 locals=0xd8 funcid=0x0 align=0x0
0x0000 00000 (main.go:9)    LEAQ    -88(SP), R12
0x0005 00005 (main.go:9)    CMPQ    R12, 16(R14)   // compare SP with stackguard0
0x0009 00009 (main.go:9)    PCDATA  $0, $-2
0x0009 00009 (main.go:9)    JLS     120            // jump if SP < guard
...
0x0078 00120 (main.go:11)   CALL    runtime.morestack_noctxt(SB)

3.2 Performing the growth

If growth is required, runtime.morestack_noctxt eventually calls runtime.newstack , which doubles the current stack size, allocates a new region with stackalloc , copies the old contents, updates the goroutine's stack pointers, and frees the old region.

func newstack() {
    // new stack is twice the old size
    oldsize := gp.stack.hi - gp.stack.lo
    newsize := oldsize * 2
    ...
    // allocate new stack and copy old data
    copystack(gp, newsize)
    ...
}

func copystack(gp *g, newsize uintptr) {
    // allocate new stack
    new := stackalloc(uint32(newsize))
    // copy old stack contents
    memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
    // switch to new stack
    gp.stack = new
    gp.stackguard0 = new.lo + _StackGuard
    // free old stack
    stackfree(old)
}

The stackalloc routine maintains a private pool of memory blocks; only when the pool is exhausted does it fall back to mmap to request more memory from the OS. A complementary shrink‑stack mechanism ( shrinkstack ) also exists but is not detailed here.

4. Summary

Linux allocates a 4 KB stack for a process at exec ; threads are created in user space and receive independent stacks via mmap . Go follows the same model: each OS thread carries a special g0 goroutine whose stack is handed to the kernel via clone . Ordinary goroutines start with a tiny 2 KB stack that grows on demand by doubling, driven by guard checks inserted by the compiler. This design enables Go to support massive concurrency with low memory overhead.

concurrencyGoRuntimeStackMemoryGoroutine
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.