Fundamentals 28 min read

Inside Go’s Goroutine Scheduler: Concepts, Evolution, and Design

This article explains how Go’s goroutine scheduler works, covering the fundamentals of OS thread scheduling, the transition from the old G‑M model to the modern G‑P‑M model, pre‑emptive scheduling strategies, lifecycle details, practical debugging tools, and performance‑related design choices.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Inside Go’s Goroutine Scheduler: Concepts, Evolution, and Design

Goroutine Scheduler Concepts

Scheduling in operating systems assigns processes or threads to physical CPUs; traditional languages like C/C++ rely on OS thread scheduling. Threads are lightweight compared to processes but introduce synchronization complexities such as locks and deadlocks.

Threads are the OS's basic scheduling unit; Linux does not differentiate between process and thread scheduling.

Multithreaded programming faces issues like difficult inter‑thread communication, unpredictable thread‑pool sizing, memory overhead per thread, and the need for network multiplexing in services that cannot create many threads.

To address these problems Go introduces goroutines , user‑level threads (also called coroutines) that are scheduled by the Go runtime instead of the OS. A goroutine occupies only a few kilobytes of stack, which can grow on demand, allowing massive concurrency with low memory cost.

The component that places goroutines onto CPUs is called the goroutine scheduler or goroutine scheduler .

Evolution of the Goroutine Scheduler

G‑M Model

Before 2012 Go used the G‑M model. Each goroutine (G) was paired with a runtime structure, and each OS thread (M) represented a physical CPU. All Gs were stored in a global queue protected by a single mutex, causing heavy lock contention, poor load balancing, high memory usage, and frequent thread blocking.

The model was replaced after about four years due to these limitations.

G‑P‑M Model

The modern scheduler adds a Processor (P) abstraction and uses work‑stealing. Each M must own a P to run Gs. P holds a local run queue of Gs; when a local queue is empty, the M can steal Gs from other Ps or pull from a global queue.

The scheduler consists of four parts:

Global queue (stores runnable Gs)

P’s local queue (limited to 256 Gs)

P list (created at startup, up to GOMAXPROCS entries)

M threads (execute Gs, bound to a P)

M obtains a G from its P’s local queue; if empty, it pulls a batch from the global queue or steals from another P. The OS schedules the M threads onto physical CPUs.

Number of P and M

The number of Ps is determined by the environment variable $GOMAXPROCS or the runtime call runtime.GOMAXPROCS(). Only that many goroutines can run simultaneously.

The maximum number of Ms is large (default 10 000) but the kernel cannot support that many; the limit can be changed with runtime/debug.SetMaxThreads. When an M blocks, a new M may be created.

There is no fixed ratio between P and M; a blocked M can be replaced by another M while the same P remains active.

Pre‑emptive Scheduling

Go’s scheduler implements pre‑emptive scheduling to avoid goroutine starvation. A goroutine is pre‑empted after 10 ms of CPU time or when a system call exceeds 20 µs. Two mechanisms are used:

Cooperative Pre‑emptive Scheduling

Compiler inserts calls to runtime.morestack at function entry points. The runtime checks a flag ( StackPreempt) set by the garbage collector or the scheduler; if set, the goroutine yields.

Compiler inserts runtime.morestack before each function call.

GC or sysmon sets StackPreempt when a goroutine runs too long. runtime.morestack detects the flag and triggers a pre‑empt.

This method is simple but cannot pre‑empt tight loops without function calls.

Signal‑based Pre‑emptive Scheduling (Go 1.14+)

Go registers a SIGURG handler ( runtime.doSigPreempt). When the GC pauses the world or a stack scan occurs, the runtime marks the running goroutine as pre‑emptable, sends SIGURG to the OS thread, and the handler forces the goroutine to yield via a series of runtime calls ( runtime.asyncPreempt, runtime.preemptPark, etc.). This improves pre‑emptibility during GC but does not solve all cases.

_Gpreempted indicates a goroutine stopped by pre‑emptive scheduling.

go func() Scheduling Flow

When go func() creates a new goroutine:

A G is created and placed in the local queue of the current P; if the local queue is full, it goes to the global queue.

M must own a P; it pops a runnable G from its P’s local queue (or steals from another P) and executes it.

If the G performs a syscall or blocks, the M detaches from the P, and a new OS thread may be created to serve the P.

When the syscall returns, the G is re‑queued; if no P is available, the M sleeps and the G returns to the global queue.

Goroutine Lifecycle

Two special entities exist:

M0 : the main OS thread created at program start.

G0 : a scheduler‑only goroutine attached to each M for internal work.

The typical program flow is:

Runtime creates M0 and G0.

Initializes P list based on GOMAXPROCS.

Creates the main goroutine (runtime.main) which runs main.main.

M0 executes the main G, which may spawn additional goroutines.

When a G finishes, M picks the next runnable G until the program exits.

State Summary

Key goroutine states: _Gidle: allocated but not yet initialized. _Grunnable: ready to run, stored in a queue. _Grunning: currently executing. _Gsyscall: in a system call. _Gwaiting: blocked on a channel or other wait. _Gdead: finished or never used. _Gpreempted: stopped by pre‑emptive scheduling.

Processor (P) states include _Pidle, _Prunning, _Psyscall, and _Pgcstop. M threads can be “spinning” (no G to run) or idle.

Scheduler Design Principles

Thread reuse : Goroutines run on a pool of OS threads; the scheduler reuses threads via work‑stealing and hand‑off instead of constantly creating/destroying them.

Parallelism : GOMAXPROCS limits the number of active Ps, allowing the program to scale across multiple CPU cores.

Two small strategies :

Pre‑emptive scheduling to avoid starvation.

Global G queue for load balancing when local queues are empty.

Debugging and Visualization

Two ways to inspect G‑P‑M data: go tool trace – records runtime events and provides a web UI.

Runtime debug trace ( GODEBUG=schedtrace=1000) – prints scheduler statistics such as GOMAXPROCS, idle/active Ps, thread counts, and run‑queue sizes.

Example trace output shows how many Ps are idle, how many threads are spinning, and the distribution of Gs across local queues.

Example Code Snippets

package main
import "fmt"
func main() {
    fmt.Println("Hello scheduler")
}
func main() {
    for i := 0; i < 5; i++ {
        time.Sleep(time.Second)
        fmt.Println("Hello scheduler")
    }
}

Running the program with GODEBUG=schedtrace=1000,scheddetail=1 prints detailed scheduler state after each second.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

concurrencySchedulerRuntimeGoroutinepreemptive schedulingg-p-m model
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.