Understanding Go's Goroutine Scheduler: Design, GMP Model, and Optimizations
The article explains Go’s GMP‑based goroutine scheduler, detailing how logical processors, thread‑local queues, and work‑stealing replace a naïve one‑goroutine‑per‑thread model, and discusses fairness, cooperative preemption, current limitations, and future optimizations compared with custom coroutine frameworks.
This article reviews Dmitry Vyukov’s 2019 talk on the design of Go’s goroutine scheduler, summarizing the software‑design ideas behind the GMP model and discussing practical optimization details.
GMP Abbreviation : G – goroutine, M – machine thread, P – logical processor.
Design Goals : Provide a high‑efficiency concurrent programming model where a single keyword ( go ) creates a lightweight execution context, and the runtime can schedule it efficiently. Additional goals include unbounded goroutine stacks and fair scheduling.
From Threads to Thread Pools : A naïve one‑goroutine‑per‑thread approach is too costly. Introducing a thread pool limits the number of OS threads (N) and requires a strategy for assigning goroutines to threads. The initial solution uses a global run queue (GRQ) from which threads steal work.
Thread‑Local Queues and Work Stealing : Each processor (P) owns a local run queue (LRQ). When LRQ is empty, the processor steals goroutines from other LRQs. This reduces contention but still incurs mutex overhead under heavy load.
Decoupling Resources from Threads : By moving LRQ and related storage into a Processor abstraction, the number of logical processors (P) can be fixed while the actual OS thread count varies, completing the GMP model.
Fairness and Preemption : Go adopts a time‑slice‑based preemption similar to OS scheduling. Two approaches are considered: signal‑based interruption and cooperative checks. The runtime prefers cooperative checks, inserting stack‑growth checks at function entry to monitor time slices.
Limitations : Cooperative checks cannot preempt a tight infinite loop without check points, and goroutines blocked in system calls relinquish the processor voluntarily.
Future Work Mentioned in the Talk : Reduce work‑steal overhead on many‑core machines, handle edge cases of dead loops, improve scalability of network‑related and timer goroutines.
Comparison with a Custom C++ Coroutine Framework : The author briefly contrasts Go’s design with a game‑server coroutine framework that uses a single‑thread model, fixed‑size stacks, and explicit hand‑off scheduling, highlighting trade‑offs between simplicity and scalability.
Overall, the article recaps key design patterns such as thread pools, resource pools, compute‑storage separation, and interrupt vs. polling mechanisms, providing a solid conceptual foundation for understanding Go’s scheduler.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.