How to Build a Stackful C++ Coroutine from Scratch: Deep Dive into Context Switching
This article provides a step‑by‑step technical guide to implementing C++ stackful coroutines, covering the design of the owl.context API, low‑level context‑switch principles, register saving conventions on 32‑bit ARM, and complete source code for co_getcontext, co_setcontext, co_swapcontext and co_makecontext with illustrative examples and diagrams.
Introduction
The article explains the low‑level principles of C++ coroutine context switching and walks the reader through a complete implementation from scratch, aiming to help developers understand and reuse the core mechanisms.
owl.context API Design
Four core APIs are defined to manage coroutine contexts:
typedef struct { void* base; size_t size; } co_stack_t;</code><code>typedef struct co_context { co_reg_t regs[32]; co_stack_t stack; struct co_context* link; } co_context_t;</code><code>int co_getcontext(co_context_t* ctx);</code><code>void co_setcontext(const co_context_t* ctx);</code><code>void co_swapcontext(co_context_t* octx, const co_context_t* ctx);</code><code>void co_makecontext(co_context_t* ctx, void (*fn)(uintptr_t), uintptr_t arg);</code></p><h2>Context‑Switch Example</h2><p>A simple test program demonstrates how <code>co_getcontext</code> and <code>co_setcontext</code> work like an enhanced <code>goto</code> that jumps between stack frames:</p><pre><code>void test() { printf("start
"); volatile int n = 3; co_context_t ctx; int ret = co_getcontext(&ctx); if (n > 0) { printf("ret = %d, n = %d
", ret, n); sleep(1); --n; co_setcontext(&ctx); } printf("end
"); }Running the program produces:
start
ret = 0, n = 3
ret = 1, n = 2
ret = 1, n = 1
endThe output shows that after co_setcontext the function resumes at the point where co_getcontext returned, this time with a return value of 1.
Principles of Context Switching
A thread’s execution context consists of CPU registers and private thread data. On most operating systems the context is essentially the register set, so saving and restoring registers is sufficient. The article reviews the ARM AAPCS calling convention, listing the registers that must be preserved (callee‑saved registers r4‑r11, r9, s16‑s31, SP, LR) and those that may be clobbered.
Implementation of co_getcontext
The function saves the required registers to ctx->regs and returns 0 on the first call:
.globl co_getcontext
co_getcontext:
/* save r4‑r11, lr, sp */
mov r1, sp
stmia r0!, { r4‑r11, lr }
stmia r0!, { r1 }
/* save s16‑s31 */
add r0, r0, #24
vstmia r0, { s16‑s31 }
mov r0, #0
mov pc, lrAn image shows the memory layout of ctx->regs after saving.
Implementation of co_setcontext
The counterpart restores the saved registers and, if a function pointer is present, calls it; otherwise it returns 1 to indicate a resumed co_getcontext call:
.globl co_setcontext
co_setcontext:
ldmia r0!, { r4‑r11, lr }
ldmia r0!, { r1‑r3 }
mov sp, r1
add r0, r0, #16
vldmia r0, { s16‑s31 }
cmp r2, #0
bne .cofunc
mov r0, #1
mov pc, lr
.cofunc:
mov r0, r3
mov pc, r2Implementation of co_swapcontext
Implemented in plain C by combining the two previous functions:
void co_swapcontext(co_context_t* octx, const co_context_t* ctx) {
if (co_getcontext(octx) == 0) {
co_setcontext(ctx);
}
}Implementation of co_makecontext
Creating a new execution environment requires setting up the stack pointer, link, function pointer and argument, and installing a stub ( co_jump_to_link) that jumps to the linked context after the entry function returns. The register layout is extended with FN and ARG fields.
#define R4 0
#define LR 8
#define SP 9
#define FN 10
#define ARG 11
void co_makecontext(co_context_t* ctx, void (*fn)(uintptr_t), uintptr_t arg) {
uintptr_t stack_top = (uintptr_t)ctx->stack.base + ctx->stack.size;
uintptr_t* sp = (uintptr_t*)(stack_top & -8L); // 8‑byte alignment
ctx->regs[R4] = (uintptr_t)ctx->link;
ctx->regs[LR] = (uintptr_t)&co_jump_to_link;
ctx->regs[SP] = (uintptr_t)sp;
ctx->regs[FN] = (uintptr_t)fn;
ctx->regs[ARG] = arg;
}
/* stub implemented in assembly */
.globl co_jump_to_link
co_jump_to_link:
movs r0, r4
bne co_setcontext
b exitAnother diagram illustrates the stack layout before and after calling a function with more than four arguments, showing how excess arguments are pushed onto the stack.
Putting It All Together
A complete example creates two contexts, sets up a dedicated 4 KB stack for the second coroutine, links them, and uses co_makecontext to run co_hello with an argument:
co_context_t ctx0, ctx1;
void co_hello(uintptr_t arg) {
printf("co_hello() Enter arg = %lu
", arg);
co_swapcontext(&ctx1, &ctx0);
printf("co_hello() Exit
");
}
void test_make_context() {
printf("main start
");
char stack[4096];
ctx1.stack.base = stack;
ctx1.stack.size = sizeof(stack);
ctx1.link = &ctx0;
co_makecontext(&ctx1, &co_hello, 100);
printf("main start co_hello
");
co_swapcontext(&ctx0, &ctx1);
printf("main resume co_hello
");
co_swapcontext(&ctx0, &ctx1);
printf("main end
");
}Running it yields:
main start
main start co_hello
co_hello() Enter arg = 100
main resume co_hello
co_hello() Exit
main endConclusion
Understanding the ARM register conventions and the simple save/restore logic makes it straightforward to port owl.context to other architectures. The article also notes several open challenges such as Windows exception handling, FS/GS register quirks, x64 vs AMD64 calling conventions, and ARM/THUMB compatibility.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
