Understanding Go's Goroutine Scheduling: Design Principles, GMP Model, and Optimizations
The article reviews Dmitry Vyukov's 2019 talk on Go's goroutine scheduler, explains the GMP (goroutine‑M‑Processor) model, walks through its evolution from naive thread‑per‑goroutine to thread pools and work‑stealing, and discusses fairness, pre‑emptive scheduling, and possible future improvements.
The article introduces Dmitry Vyukov’s 2019 talk on the design of Go’s goroutine scheduler and presents the author’s reflections on the software design ideas behind the GMP (Goroutine‑M‑Processor) model.
Design Goals – Create a high‑efficiency concurrent programming model where a single go keyword can launch a goroutine, achieving both development and runtime efficiency, with unbounded stack size and fair scheduling.
From Zero to Multi‑Threading – Starting with the naive idea that each goroutine maps to a thread, the article shows why this approach does not scale due to massive thread‑creation overhead.
Thread‑Pool Solution – By limiting the number of OS threads (N) and using a global run queue, threads pull goroutines to execute, but this introduces contention on the global queue when many threads compete.
Thread‑Local Queues – To reduce contention, each thread maintains a local run queue (LRQ). When a LRQ is empty, the thread steals work from others, but stealing still incurs mutex overhead and may be wasteful on many cores.
Decoupling Resources from Threads – Introduce a Processor abstraction that owns its LRQ and related storage, achieving the classic GMP model where the number of Processors (P) is independent of the number of OS threads.
Fairness and Pre‑emptive Scheduling – Discusses the need for time‑slice based pre‑emptive scheduling to prevent long‑running goroutines from monopolizing CPU, comparing signal‑based interruption with cooperative checks, and explains why cooperative checks are preferred in Go’s runtime.
The article also lists several design takeaways such as thread pools, resource pools, compute‑storage separation, and the use of interrupts versus polling.
Further Optimizations – Highlights open problems like work‑steal overhead on many cores, edge cases where a goroutine contains no cooperative checks, and handling of network/timer goroutines that currently use a global queue.
Additional discussion covers a simple C++ coroutine framework used in the author’s game server, contrasting its single‑threaded scheduler, fixed‑size stacks, and hand‑off strategy with Go’s more sophisticated runtime.
The author, Wu Lianhuo, is a senior engineer at Tencent Games, leading large‑scale distributed server architecture and cloud‑native transformation.
References to related articles on distributed databases, Dubbo, Rust, and concurrency pitfalls are provided at the end.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.