Understanding Go’s GMP Model: How Goroutine Scheduling Works
This article explains Go's GMP model—how work threads (M), logical processors (P), and goroutines (G) interact, detailing their internal structures, scheduling strategies, run‑queue management, work stealing, and both active and passive preemptive scheduling with concrete code examples.
GMP Model Overview
Go’s concurrency runtime consists of three entities: work threads (M), logical processors (P), and goroutines (G). An M is an OS thread that executes Go code. A P represents a logical processor that holds scheduling state and a local run‑queue. A G is a goroutine with its own stack and scheduler context. The binding between M, P, and G changes dynamically during execution.
Work Thread (M)
The m struct represents a worker thread. Key fields:
type m struct {
g0 *g // scheduling goroutine
tls [tlsSlots]uintptr // thread‑local storage
curg *g // currently running goroutine
p puintptr // attached P (nil if not executing Go code)
// ... other fields omitted
}Each M is bound to a P to run Go code and tracks the currently running G via curg. Every M contains a special scheduling goroutine g0 that performs scheduler work.
Logical Processor (P)
The p struct links a P to its worker thread and holds a local run‑queue:
type p struct {
id int32
status uint32 // idle, running, etc.
schedtick uint32 // increments each scheduler call
syscalltick uint32 // increments each syscall
m muintptr // back‑link to associated M
runqhead uint32 // head index of local run‑queue
runqtail uint32 // tail index of local run‑queue
runq [256]guintptr // circular buffer (length 256)
runnext guintptr // next G to run, if set
// ... other fields omitted
}A P may be idle (no M attached) or bound to an M. The local run‑queue holds up to 256 Gs; runqhead and runqtail implement a circular queue. The runnext field, when non‑zero, gives the next G to execute without scanning the queue.
Goroutine (G)
Goroutines are represented by the g struct, which contains the execution stack, the owning M, and the saved scheduler context:
type g struct {
stack stack // execution stack
m *m // current M
sched gobuf // saved registers for context switch
// ... other fields omitted
}
type stack struct {
lo uintptr
hi uintptr
}
type gobuf struct {
sp uintptr
pc uintptr
g guintptr
ctxt unsafe.Pointer
ret uintptr
lr uintptr
bp uintptr // frame pointer on supported arches
}When a G is switched out, its gobuf saves the necessary CPU registers so execution can resume later.
Global Scheduler State (schedt)
The runtime maintains a global schedt structure that tracks idle Ms, idle Ps, the global runnable queue, and various counters:
type schedt struct {
lock mutex
midle muintptr // idle Ms waiting for work
nmidle int32 // count of idle Ms
maxmcount int32 // maximum Ms allowed
// ... many other fields omitted for brevity
runq gQueue // global runnable queue
runqsize int32 // size of global queue
// cache of dead Gs, etc.
}The global queue is shared by all Ps; access is protected by sched.lock to ensure atomic updates.
Goroutine Scheduling Mechanics
Each M runs a scheduling goroutine g0 that repeatedly performs the cycle g → g0 → g. The scheduler selects the next G using a three‑step strategy.
Scheduling Strategy
Check the local run‑queue of the current P.
If empty, check the global run‑queue.
If still empty, steal work from another P’s local queue.
Finding a Runnable Goroutine
func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
// 1. Occasionally poll the global queue for fairness
if _p_.schedtick%61 == 0 && sched.runqsize > 0 {
lock(&sched.lock)
gp = globrunqget(_p_, 1)
unlock(&sched.lock)
if gp != nil {
return gp, false, false
}
}
// 2. Try local run‑queue
if gp, inheritTime = runqget(_p_); gp != nil {
return gp, inheritTime, false
}
// 3. Try global run‑queue
if sched.runqsize != 0 {
lock(&sched.lock)
gp = globrunqget(_p_, 0)
unlock(&sched.lock)
if gp != nil {
return gp, false, false
}
}
// 4. Steal work from other Ps
// ... omitted for brevity
return nil, false, false
}Local Run‑Queue Retrieval
func runqget(_p_ *p) (gp *g, inheritTime bool) {
// Prefer runnext if set
next := _p_.runnext
if next != 0 && _p_.runnext.cas(next, 0) {
return next.ptr(), true
}
for {
h := atomic.LoadAcq(&_p_.runqhead)
t := _p_.runqtail
if t == h {
return nil, false // empty
}
gp := _p_.runq[h%uint32(len(_p_.runq))].ptr()
if atomic.CasRel(&_p_.runqhead, h, h+1) {
return gp, false
}
}
}Global Run‑Queue Retrieval
func globrunqget(_p_ *p, max int32) *g {
lock(&sched.lock)
if sched.runqsize == 0 {
unlock(&sched.lock)
return nil
}
n := sched.runqsize/gomaxprocs + 1
if n > sched.runqsize { n = sched.runqsize }
if max > 0 && n > max { n = max }
if n > int32(len(_p_.runq))/2 { n = int32(len(_p_.runq))/2 }
sched.runqsize -= n
gp := sched.runq.pop()
n--
for ; n > 0; n-- {
gp1 := sched.runq.pop()
runqput(_p_, gp1, false)
}
unlock(&sched.lock)
return gp
}Every 61 scheduling ticks the scheduler pulls at least one G from the global queue to ensure fairness across Ps.
Work Stealing
func stealWork(now int64) (gp *g, inheritTime bool, rnow, pollUntil int64, newWork bool) {
const stealTries = 4
for i := 0; i < stealTries; i++ {
for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
p2 := allp[enum.position()]
if p2 == getg().m.p.ptr() { continue }
if !idlepMask.read(enum.position()) {
if gp := runqsteal(_p_, p2, false); gp != nil {
return gp, false, now, 0, false
}
}
}
}
return nil, false, now, 0, false
}Stealing moves roughly half of the victim P’s local queue entries to the thief’s queue, then returns one stolen G for execution.
Scheduling Timing
Active scheduling : a goroutine voluntarily yields via runtime.Gosched(). The runtime switches from the current G to g0, marks the G as Grunnable, detaches it from its M, enqueues it on the global queue, and calls schedule().
Passive scheduling : the runtime preempts a G when it blocks (e.g., channel wait, network I/O, GC). The G is moved to Gwaiting, detached from its M, and later made runnable by goready(), which places it on the appropriate run‑queue.
// Active scheduling example
func Gosched() {
checkTimeouts()
mcall(gosched_m)
}
func gosched_m(gp *g) {
// ...
goschedImpl(gp)
}
func goschedImpl(gp *g) {
casgstatus(gp, _Grunning, _Grunnable)
dropg()
lock(&sched.lock)
globrunqput(gp)
unlock(&sched.lock)
schedule()
} // Passive scheduling example
func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
// ...
mcall(park_m)
}
func park_m(gp *g) {
casgstatus(gp, _Grunning, _Gwaiting)
dropg()
// ... possibly wake up immediately
schedule()
}Preemptive Scheduling
The system monitor thread periodically checks for long‑running Gs (over 10 ms) or syscalls (over 20 µs) and forces preemption via retake():
const forcePreemptNS = 10 * 1000 * 1000 // 10 ms
func retake(now int64) uint32 {
lock(&allpLock)
for i := 0; i < len(allp); i++ {
_p_ := allp[i]
if _p_ == nil { continue }
s := _p_.status
if s == _Prunning || s == _Psyscall {
t := int64(_p_.schedtick)
if pd.schedwhen+forcePreemptNS <= now {
preemptone(_p_)
}
}
// ... additional syscall handling omitted
}
unlock(&allpLock)
return 0
}This mechanism ensures that no single G can monopolize a P, preserving responsiveness.
Go Development Architecture Practice
Daily sharing of Golang-related technical articles, practical resources, language news, tutorials, real-world projects, and more. Looking forward to growing together. Let's go!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
