Coroutine Scheduling and Stack Management for High-Concurrency Systems
This article explains how to achieve high‑performance C10M concurrency by moving memory management, packet handling, and scheduling out of the kernel into user space, introduces coroutine concepts, demonstrates a producer‑consumer coroutine example, discusses coroutine stack implementations (stackful vs stackless), and outlines classic scheduling strategies.
Robert Graham, CEO of Errata Security, argues that "The Kernel Is The Problem, Not The Solution" and suggests that to achieve C10M‑scale performance the kernel should be bypassed in three areas: memory management (applications manage their own memory), packet processing (applications handle packets), and scheduling (applications schedule themselves).
A coroutine (coroutine) was originally defined by Knuth as a routine with multiple entry points that can cooperatively yield and resume execution, allowing explicit CPU yielding and resumption.
Below is a producer‑consumer coroutine example:
var q := new queue
coroutine produce
loop
while q is not full
create some new items
add the items to q
# 这里的yield to == yield + resume
yield to consume
coroutine consume
loop
while q is not empty
remove some items from q
use the items
yield to produce
call produceThe example creates two coroutines; the producer fills the queue then yields to the consumer, which processes the queue and yields back. This demonstrates coroutine scheduling via yield to , not recursion.
To avoid blocking the entire thread, blocking system calls (e.g., socket reads) must be replaced with non‑blocking equivalents; otherwise the kernel will schedule another thread, causing the original thread to stay blocked. The article shows a futex flame‑graph illustrating the cost of lock contention and the importance of optimizing blocking syscalls, whose overhead is on the order of hundreds of nanoseconds, while context switches cost a few microseconds.
Implementing coroutines also requires handling coroutine stacks. A thread has its own stack, and when switching, the previous thread’s stack state must be saved and restored. Coroutines can share a single stack (M:N model) or have independent stacks. Two main categories exist: stackful (with a dedicated stack) and stackless (state‑machine based, using a single stack).
Stackful coroutine implementations include:
Static stack: each coroutine gets a fixed‑size stack (risk of overflow or waste).
Shared stack: a large common stack is allocated (e.g., 2 MiB) and coroutine stacks are copied in/out on suspend/resume; pointer validity can be an issue in languages like C.
Virtual stack: similar to shared stack but uses virtual memory mapping to give each coroutine a virtual address space.
Stackless coroutines behave like state machines: the coroutine is split into many small blocks with a single entry point, and execution proceeds by changing a state variable. This saves memory and improves speed but cannot switch arbitrarily; a coroutine must run to the next yield point. An example in C++20 is shown:
int state = 0;
func next() {
if (state == 0) { return firstCoroutinePart1(); }
else if (state == 1) { return firstCoroutinePart2(); }
}
func firstCoroutinePart1() { state = 1; /* ... */ }
func firstCoroutinePart2() { state = 0; /* ... */ }Coroutine scheduling strategies are discussed next. Two classic mechanisms are star‑shaped switching (A → scheduler → B) and round‑robin (A → B → scheduler), with round‑robin being more efficient. Modern systems often use M:N threading, where many user‑level coroutines run on a few kernel threads, employing work‑stealing and epoll to balance load and handle I/O.
References:
http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-connections-the-kernel-i.html
Programming language pragmatics
Laiye Technology Team
Official account of Laiye Technology, featuring its best tech innovations, practical implementations, and cutting‑edge industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.