Understanding Coroutines: Principles, Implementations, and Performance in C/C++
This article explains the concept of coroutines as lightweight user‑level threads, compares them with traditional threads, details various implementation mechanisms in C/C++ (including libco and NtyCo), and demonstrates how they improve I/O‑bound server performance through examples and code snippets.
1. Introduction to Coroutines
Coroutines, also known as lightweight threads, micro‑threads, or fibers, are user‑level constructs that allow functions to be paused and resumed quickly without kernel involvement, making them ideal for I/O‑intensive tasks.
Unlike normal function calls that follow a strict call‑and‑return stack, a coroutine can suspend its execution at any point, switch to another coroutine, and later resume from the same point, similar to a CPU interrupt.
Example of Coroutine Switching
print '1'
print '2'
print '3'
def B():
print 'x'
print 'y'
print 'z'When executed as coroutines, the output may interleave as:
1
2
x
y
3
z2. Advantages of Coroutines
Very high execution efficiency because context switches are controlled by the program, avoiding the overhead of kernel thread switches.
No need for locking mechanisms when only one thread runs multiple coroutines, eliminating contention on shared resources.
To utilize multiple CPU cores, combine coroutines with multiple processes.
3. Coroutine Support in Languages
Native support exists in C++20, Go, Python, etc. Other languages provide coroutine libraries (e.g., Tencent's fiber, libco).
4. C/C++ Coroutines
C++20 introduces native coroutine support, but compiler and library compatibility is still catching up. Most C/C++ coroutine libraries rely on two approaches: assembly‑level context switching or OS‑provided APIs.
Typical Libraries
libco, Boost.context – assembly based
phxrpc – based on ucontext/Boost.context
libmill – based on setjmp/longjmp
Low‑Level Implementation Mechanisms
Assembly‑based context switch (fastest)
Switch‑case state machines
OS APIs: Linux ucontext, Windows Fiber
setjmp/longjmp with static locals
Example of setjmp/longjmp
#include
#include
jmp_buf buf;
void banana(){
printf("in banana() \n");
longjmp(buf,1);
printf("you'll never see this");
}
int main(){
if(setjmp(buf))
printf("back in main\n");
else {
printf("first time through\n");
banana();
}
return 0;
}5. libco Coroutine Structure
libco represents a coroutine with stCoRoutine_t , which stores the execution environment, function pointer, arguments, stack information, and status flags.
struct stCoRoutine_t {
stCoRoutineEnv_t *env; // execution environment
pfn_co_routine_t pfn; // coroutine function
void *arg; // argument
coctx_t ctx; // saved context
...
char cEnableSysHook; // system hook flag
char cIsShareStack; // shared‑stack flag
void *pvEnv;
stStackMem_t* stack_mem; // stack memory
char* stack_sp; // stack pointer
unsigned int save_size;
char* save_buffer;
};6. NtyCo – A Coroutine‑Based I/O Framework
NtyCo combines coroutine scheduling with asynchronous I/O. It provides APIs such as nty_coroutine_create , nty_coroutine_resume , nty_coroutine_yield , and POSIX‑style socket wrappers ( nty_socket , nty_accept , nty_recv , nty_send , nty_close ).
Creating a Coroutine
int nty_coroutine_create(nty_coroutine **new_co, proc_coroutine func, void *arg) {
// allocate and initialise coroutine structure
nty_coroutine *co = calloc(1, sizeof(nty_coroutine));
posix_memalign(&co->stack, getpagesize(), sched->stack_size);
co->func = func;
co->arg = arg;
co->birth = nty_coroutine_usec_now();
*new_co = co;
TAILQ_INSERT_TAIL(&co->sched->ready, co, ready_next);
return 0;
}Yield and Resume
void nty_coroutine_yield(nty_coroutine *co) {
// save current context and switch back to scheduler
co_swap(co, scheduler_current);
}
int nty_coroutine_resume(nty_coroutine *co) {
// restore coroutine context and run until next yield
co_swap(scheduler_current, co);
return 0;
}7. Scheduler Design
The scheduler maintains three collections: a ready queue (FIFO), a sleep tree (ordered by wake‑up time), and a wait tree (for I/O events). It repeatedly:
Moves expired sleep entries to the ready queue.
Processes epoll/kqueue events, moving ready I/O coroutines to the ready queue.
Resumes coroutines from the ready queue.
8. Epoll‑Based Event Loop
Coroutines use co_poll to register file descriptors with epoll. The function stores a pointer to a stPollItem_t in epoll_event.data.ptr so that when the event fires, the scheduler can retrieve the associated coroutine and resume it.
int co_poll(stCoEpoll_t *ctx, struct pollfd fds[], nfds_t nfds, int timeout_ms) {
// register fds with epoll and associate each with its coroutine
for (int i = 0; i < nfds; ++i) {
struct epoll_event ev = { .events = fds[i].events, .data.ptr = &poll_items[i] };
epoll_ctl(ctx->iEpollFd, EPOLL_CTL_ADD, fds[i].fd, &ev);
}
// optional timeout handling via timing‑wheel
AddTimeout(...);
co_yield_env(env); // suspend current coroutine
// on wake‑up, remove fds and clean up
return 0;
}9. Timing‑Wheel Timer
For coroutine‑level timeouts, libco uses a 60‑second timing‑wheel. Each timeout item is placed into a bucket based on its expiration offset; the wheel advances once per second, moving expired items to the active list so their coroutines can be resumed.
10. Performance Evaluation
Tests on a 4‑core Ubuntu 14.04 server with 6 GB RAM and three client VMs showed that using coroutines with epoll yields ~900 ms response time for 1 000 concurrent connections, compared to ~6.5 s for a purely synchronous design, demonstrating the high throughput of coroutine‑based asynchronous I/O.
11. Practical Usage of libco
Typical usage steps:
Create a listening socket (non‑blocking).
Spawn a coroutine for each accepted connection using co_create and co_resume .
Inside the coroutine, perform reads/writes via co_poll to let the scheduler handle readiness.
Run the main event loop with co_eventloop , which repeatedly calls epoll_wait , processes timed‑out coroutines, and resumes ready ones.
By keeping all I/O in a single thread of coroutines, developers obtain the simplicity of synchronous code while achieving the scalability of asynchronous, event‑driven servers.
Further Reading
Deep Dive into C++ Memory Management
Linux Kernel’s New Maple Tree
Inside Linux Kernel Architecture
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.