Fundamentals 32 min read

Deep Dive into Go's G‑M‑P Scheduler and Runtime Execution Flow

The article details Go’s G‑M‑P scheduler, showing how lightweight Goroutines (G) run on OS threads (M) bound to virtual processors (P), describing runtime initialization, run‑queue management, work‑stealing, the main schedule loop, execution binding, and cleanup via goexit.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Deep Dive into Go's G‑M‑P Scheduler and Runtime Execution Flow

The article explains how Go replaces OS thread scheduling with its own lightweight abstraction called Goroutine, which consumes far less memory (2 KB stack) and has lower context‑switch overhead compared to traditional threads.

It introduces the G‑M‑P model used by the Go runtime:

G : represents a Goroutine, holding its own stack and execution state.

M : an OS thread bound to a single struct m instance.

P : a virtual processor that owns a local run queue of runnable Gs.

The core scheduler structures are defined in the runtime source. Key definitions include:

type g struct {
    // current Goroutine's stack range [stack.lo, stack.hi)
    stack       stack
    stackguard0 uintptr
    m           *m
    sched       gobuf
    atomicstatus uint32
    preempt     bool
    ...
}

type gobuf struct {
    sp   uintptr // stack pointer
    pc   uintptr // program counter
    g    guintptr // associated Goroutine
    ret  sys.Uintreg // syscall return value
    ...
}

During program startup, runtime.rt0_go performs several initialization steps:

Calls runtime.schedinit to set up the scheduler, GC, and other runtime components.

Creates the initial Goroutine that runs runtime.main via runtime.newproc .

Starts the first OS thread (M) with runtime.mstart , which eventually calls runtime.schedule .

Scheduler initialization details:

schedinit sets maxmcount = 10000 (maximum threads) and determines the number of P processors based on GOMAXPROCS and CPU count.

mcommoninit initializes the first M (M0) and links it into the global allm list.

procresize creates or expands the global allp slice, initializing each p with p.init(id) , which sets its status to _Pgcstop and prepares caches.

Binding a P to an M is done by runtime.acquirep (which calls wirep ) and the reverse by runtime.releasep .

Creating a new Goroutine involves:

func newproc(siz int32, fn *funcval) {
    // allocate or reuse a G
    newg := newproc1(fn, argp, siz, gp, pc)
    // put it into the local run queue of the current P
    runqput(_p_, newg, true)
    if mainStarted {
        wakep()
    }
}

func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
    // try to get a G from the per‑P free list
    newg := gfget(_p_)
    if newg == nil {
        newg = malg(_StackMin) // allocate a new G with a 2KB stack
        casgstatus(newg, _Gidle, _Gdead)
        allgadd(newg)
    }
    // set up stack, pc, arguments, etc.
    newg.sched.sp = sp
    newg.sched.pc = funcPC(goexit) + sys.PCQuantum
    newg.sched.g = guintptr(unsafe.Pointer(newg))
    // make it runnable
    casgstatus(newg, _Gdead, _Grunnable)
    return newg
}

Run‑queue management functions:

runqput(p *p, gp *g, next bool) places a Goroutine into either the runnext slot or the local circular queue; if the local queue is full it falls back to the global run queue.

runqget(p *p) (gp *g, inheritTime bool) first checks runnext , then dequeues from the local queue.

globrunqget(p *p, max int32) *g steals a batch of Gs from the global run queue and fills the local queue.

The heart of the scheduler is runtime.schedule . It repeatedly:

Handles GC safepoints.

Checks timers.

Tries to obtain a runnable G from the local queue, the global queue (every 61 ticks), or the network poller.

If none are available, it calls findrunnable which performs work‑stealing across all Ps, checks GC work, and finally puts the M to sleep.

Work‑stealing ( findrunnable ) iterates over a random order of Ps, attempts to steal half of their local run queues, and also looks at their runnext slots or timers. It respects the spinning flag, which indicates that the M is actively trying to steal work.

When a runnable G is finally found, runtime.execute(gp, inheritTime) binds the G to the current M, marks it _Grunning , and jumps to the Goroutine’s entry point via runtime.gogo . After the Goroutine finishes, runtime.goexit runs:

func goexit0(gp *g) {
    casgstatus(gp, _Grunning, _Gdead)
    // clean up G fields
    gp.m = nil
    gp.writebuf = nil
    // detach from M
    dropg()
    // recycle the G
    gfput(_g_.m.p.ptr(), gp)
    // schedule next work
    schedule()
}

The article concludes with a visual summary of the scheduling flow, showing how Gs move between local and global queues, how M threads acquire and release Ps, and how the scheduler continuously loops to keep the program progressing.

ConcurrencyGoSchedulerruntimeGMP modelgoroutine
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.