Fundamentals 22 min read

Understanding Go’s GMP Model: How Goroutine Scheduling Works

This article explains Go's GMP model—how work threads (M), logical processors (P), and goroutines (G) interact, detailing their internal structures, scheduling strategies, run‑queue management, work stealing, and both active and passive preemptive scheduling with concrete code examples.

Go Development Architecture Practice

Jul 18, 2024

Understanding Go’s GMP Model: How Goroutine Scheduling Works

GMP Model Overview

Go’s concurrency runtime consists of three entities: work threads (M), logical processors (P), and goroutines (G). An M is an OS thread that executes Go code. A P represents a logical processor that holds scheduling state and a local run‑queue. A G is a goroutine with its own stack and scheduler context. The binding between M, P, and G changes dynamically during execution.

Work Thread (M)

The m struct represents a worker thread. Key fields:

type m struct {
    g0   *g        // scheduling goroutine
    tls  [tlsSlots]uintptr // thread‑local storage
    curg *g        // currently running goroutine
    p    puintptr  // attached P (nil if not executing Go code)
    // ... other fields omitted
}

Each M is bound to a P to run Go code and tracks the currently running G via curg. Every M contains a special scheduling goroutine g0 that performs scheduler work.

Logical Processor (P)

The p struct links a P to its worker thread and holds a local run‑queue:

type p struct {
    id          int32
    status      uint32 // idle, running, etc.
    schedtick   uint32 // increments each scheduler call
    syscalltick uint32 // increments each syscall
    m           muintptr // back‑link to associated M
    runqhead    uint32 // head index of local run‑queue
    runqtail    uint32 // tail index of local run‑queue
    runq        [256]guintptr // circular buffer (length 256)
    runnext     guintptr // next G to run, if set
    // ... other fields omitted
}

A P may be idle (no M attached) or bound to an M. The local run‑queue holds up to 256 Gs; runqhead and runqtail implement a circular queue. The runnext field, when non‑zero, gives the next G to execute without scanning the queue.

Goroutine (G)

Goroutines are represented by the g struct, which contains the execution stack, the owning M, and the saved scheduler context:

type g struct {
    stack stack   // execution stack
    m    *m      // current M
    sched gobuf  // saved registers for context switch
    // ... other fields omitted
}

type stack struct {
    lo uintptr
    hi uintptr
}

type gobuf struct {
    sp   uintptr
    pc   uintptr
    g    guintptr
    ctxt unsafe.Pointer
    ret  uintptr
    lr   uintptr
    bp   uintptr // frame pointer on supported arches
}

When a G is switched out, its gobuf saves the necessary CPU registers so execution can resume later.

Global Scheduler State (schedt)

The runtime maintains a global schedt structure that tracks idle Ms, idle Ps, the global runnable queue, and various counters:

type schedt struct {
    lock mutex
    midle        muintptr // idle Ms waiting for work
    nmidle       int32   // count of idle Ms
    maxmcount    int32   // maximum Ms allowed
    // ... many other fields omitted for brevity
    runq     gQueue   // global runnable queue
    runqsize int32    // size of global queue
    // cache of dead Gs, etc.
}

The global queue is shared by all Ps; access is protected by sched.lock to ensure atomic updates.

Goroutine Scheduling Mechanics

Each M runs a scheduling goroutine g0 that repeatedly performs the cycle g → g0 → g. The scheduler selects the next G using a three‑step strategy.

Scheduling Strategy

Check the local run‑queue of the current P.

If empty, check the global run‑queue.

If still empty, steal work from another P’s local queue.

Finding a Runnable Goroutine

func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
    // 1. Occasionally poll the global queue for fairness
    if _p_.schedtick%61 == 0 && sched.runqsize > 0 {
        lock(&sched.lock)
        gp = globrunqget(_p_, 1)
        unlock(&sched.lock)
        if gp != nil {
            return gp, false, false
        }
    }
    // 2. Try local run‑queue
    if gp, inheritTime = runqget(_p_); gp != nil {
        return gp, inheritTime, false
    }
    // 3. Try global run‑queue
    if sched.runqsize != 0 {
        lock(&sched.lock)
        gp = globrunqget(_p_, 0)
        unlock(&sched.lock)
        if gp != nil {
            return gp, false, false
        }
    }
    // 4. Steal work from other Ps
    // ... omitted for brevity
    return nil, false, false
}

Local Run‑Queue Retrieval

func runqget(_p_ *p) (gp *g, inheritTime bool) {
    // Prefer runnext if set
    next := _p_.runnext
    if next != 0 && _p_.runnext.cas(next, 0) {
        return next.ptr(), true
    }
    for {
        h := atomic.LoadAcq(&_p_.runqhead)
        t := _p_.runqtail
        if t == h {
            return nil, false // empty
        }
        gp := _p_.runq[h%uint32(len(_p_.runq))].ptr()
        if atomic.CasRel(&_p_.runqhead, h, h+1) {
            return gp, false
        }
    }
}

Global Run‑Queue Retrieval

func globrunqget(_p_ *p, max int32) *g {
    lock(&sched.lock)
    if sched.runqsize == 0 {
        unlock(&sched.lock)
        return nil
    }
    n := sched.runqsize/gomaxprocs + 1
    if n > sched.runqsize { n = sched.runqsize }
    if max > 0 && n > max { n = max }
    if n > int32(len(_p_.runq))/2 { n = int32(len(_p_.runq))/2 }
    sched.runqsize -= n
    gp := sched.runq.pop()
    n--
    for ; n > 0; n-- {
        gp1 := sched.runq.pop()
        runqput(_p_, gp1, false)
    }
    unlock(&sched.lock)
    return gp
}

Every 61 scheduling ticks the scheduler pulls at least one G from the global queue to ensure fairness across Ps.

Work Stealing

func stealWork(now int64) (gp *g, inheritTime bool, rnow, pollUntil int64, newWork bool) {
    const stealTries = 4
    for i := 0; i < stealTries; i++ {
        for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
            p2 := allp[enum.position()]
            if p2 == getg().m.p.ptr() { continue }
            if !idlepMask.read(enum.position()) {
                if gp := runqsteal(_p_, p2, false); gp != nil {
                    return gp, false, now, 0, false
                }
            }
        }
    }
    return nil, false, now, 0, false
}

Stealing moves roughly half of the victim P’s local queue entries to the thief’s queue, then returns one stolen G for execution.

Scheduling Timing

Active scheduling : a goroutine voluntarily yields via runtime.Gosched(). The runtime switches from the current G to g0, marks the G as Grunnable, detaches it from its M, enqueues it on the global queue, and calls schedule().

Passive scheduling : the runtime preempts a G when it blocks (e.g., channel wait, network I/O, GC). The G is moved to Gwaiting, detached from its M, and later made runnable by goready(), which places it on the appropriate run‑queue.

// Active scheduling example
func Gosched() {
    checkTimeouts()
    mcall(gosched_m)
}

func gosched_m(gp *g) {
    // ...
    goschedImpl(gp)
}

func goschedImpl(gp *g) {
    casgstatus(gp, _Grunning, _Grunnable)
    dropg()
    lock(&sched.lock)
    globrunqput(gp)
    unlock(&sched.lock)
    schedule()
}

// Passive scheduling example
func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
    // ...
    mcall(park_m)
}

func park_m(gp *g) {
    casgstatus(gp, _Grunning, _Gwaiting)
    dropg()
    // ... possibly wake up immediately
    schedule()
}

Preemptive Scheduling

The system monitor thread periodically checks for long‑running Gs (over 10 ms) or syscalls (over 20 µs) and forces preemption via retake():

const forcePreemptNS = 10 * 1000 * 1000 // 10 ms

func retake(now int64) uint32 {
    lock(&allpLock)
    for i := 0; i < len(allp); i++ {
        _p_ := allp[i]
        if _p_ == nil { continue }
        s := _p_.status
        if s == _Prunning || s == _Psyscall {
            t := int64(_p_.schedtick)
            if pd.schedwhen+forcePreemptNS <= now {
                preemptone(_p_)
            }
        }
        // ... additional syscall handling omitted
    }
    unlock(&allpLock)
    return 0
}

This mechanism ensures that no single G can monopolize a P, preserving responsiveness.

Go scheduler Runtime GMP

Written by

Go Development Architecture Practice

Daily sharing of Golang-related technical articles, practical resources, language news, tutorials, real-world projects, and more. Looking forward to growing together. Let's go!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.