In-depth Analysis of Go Language Scheduler: G, M, P Concepts and Scheduling Loop
The article walks through Go’s 1.9.2 scheduler on Linux, explaining how the three abstractions—G (goroutine), M (OS thread), and P (processor)—are created, bound, and used in the scheduling loop to run, yield, and balance goroutines via local, global, and stolen queues.
The article provides a source‑code level walkthrough of the Go scheduler (based on Go 1.9.2) focusing on the Linux implementation. It introduces the three core abstractions: G (goroutine), M (kernel thread), and P (processor), explaining how they interact to schedule user goroutines.
During process startup, runtime.osinit determines the number of CPU cores, runtime.schedinit creates the initial M0 and G0, sets up the maximum M count, and initializes a number of P objects according to GOMAXPROCS or the detected core count. The runtime·mainPC function then launches the main goroutine and starts the first M, which enters the scheduling loop.
The scheduling loop repeatedly binds an M to a P, obtains a runnable G from the P’s local run queue, the global run queue, or by stealing from other Ps, executes the G via the gogo assembly routine, and upon completion returns the G to a free list via goexit, which calls schedule again to pick the next G.
A G can yield the CPU in several ways: normal termination, active yielding through constructs such as time.Sleep, mutex locks, or channel operations (all of which eventually call gopark to transition the G to _Gwaiting and invoke schedule), forced preemption by the sysmon background thread that sets a stack‑guard flag to trigger a stack‑expansion‑based preemption, and system‑call entry/exit which temporarily marks the G as _Gsyscall and returns to the scheduler.
Runnable Gs originate from two main sources: the creation of new goroutines via go func (handled by newproc → newproc1, which places the G onto the current P’s run queue) and I/O readiness reported by the netpoll mechanism, where the sysmon thread extracts ready Gs from epoll and injects them into the global run queue.
Understanding these mechanisms clarifies how Go achieves efficient, lightweight concurrency with minimal overhead, and how the scheduler balances fairness, low latency, and utilization of multi‑core hardware.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.