Deep Dive into C++ Coroutines: From C++17 Stackless Implementation to C++20 and Scheduler Design
This article compares the legacy C++17 stackless coroutine implementation using macro‑generated state machines with the native C++20 coroutine model, explains their core concepts, and demonstrates how to design a flexible scheduler that manages various await modes and integrates custom awaitable tasks.
This article provides an in‑depth analysis of C++ coroutines, covering both the legacy C++17 stackless approach and the modern C++20 coroutine model, and explains how a custom coroutine scheduler can be built around them.
Other language coroutine examples
Before diving into C++, the article shows simple coroutine snippets from other languages:
Future
getPage(t) async {
var c = new http.Client();
try {
var r = await c.get('http://xxx');
print(r);
return r.length();
} finally {
await c.close();
}
} async def abinary(n):
if n <= 0:
return 1
l = await abinary(n-1)
r = await abinary(n-1)
return l + 1 + r async Task
WaitAsync() {
await Task.Delay(10000);
return "Finished";
}C++17 stackless coroutine (Duff Device hack)
The C++17 implementation relies on macros that expand into a large switch‑case state machine. A CoPromise object stores the user function and a std::tuple for saved state. The execution flow is driven by __co_await() macros that set a line label, and the scheduler jumps to the correct case on resume.
#define rco_begin() switch(state) { case 0: break; }
#define __co_await() state = __LINE__; return; case __LINE__:Limitations of this approach include the inability to place code before __co_begin() , and the need to move all stack variables into the CoPromise tuple.
C++20 coroutine fundamentals
C++20 introduces native coroutine keywords ( co_await , co_return ) and a set of core types: the function body, coroutine_handle , promise_type , and awaitable objects. The compiler rewrites the coroutine into a frame structure ( __counterFrame ) and separate resume/destroy functions.
struct resumable_thing {
struct promise_type {
resumable_thing get_return_object() { return resumable_thing(coroutine_handle
::from_promise(*this)); }
auto initial_suspend() { return suspend_never{}; }
auto final_suspend() noexcept { return suspend_never{}; }
void return_void() {}
void unhandled_exception() {}
};
coroutine_handle
_coroutine = nullptr;
void resume() { _coroutine.resume(); }
};
resumable_thing counter() {
std::cout << "counter: called\n";
for (unsigned i = 1;; ++i) {
co_await std::suspend_always{};
std::cout << "counter:: resumed (#" << i << ")\n";
}
}The transformed code (shown by cppinsights) reveals a __counterResume function that implements the state machine using a switch on __suspend_index and goto labels for each suspend point. The three awaitable member functions are explained:
await_ready() – decides whether to suspend.
await_suspend() – runs when the coroutine is suspended (e.g., initiates async work).
await_resume() – runs when the coroutine is resumed.
Coroutine scheduler design
The scheduler manages ISchedTask objects and supports several await modes:
AwaitNever – resume immediately.
AwaitNextframe – resume on the next frame.
AwaitForNotifyNoTimeout / AwaitForNotifyWithTimeout – wait for an external notification, optionally with a timeout.
AwaitDoNothing – used for task termination.
void Scheduler::AddToImmRun(ISchedTask* schedTask) {
schedTask->Run();
if (schedTask->IsDone()) {
DestroyTask(schedTask);
return;
}
switch (schedTask->GetAwaitMode()) {
case AwaitNever:
AddToImmRun(schedTask);
break;
case AwaitNextframe:
AddToNextFrameRun(schedTask);
break;
case AwaitForNotifyNoTimeout:
case AwaitForNotifyWithTimeout:
HandleTaskAwaitForNotify(schedTask, awaitMode, awaitTimeoutMs);
break;
case AwaitDoNothing:
break;
}
}Resuming a task is done by binding a ResumeObject to the task and re‑adding it to the ready queue:
template
auto ResumeTaskByAwaitObject(E&& awaitObj) -> std::enable_if_t
::value> {
auto tid = awaitObj.taskId;
if (IsTaskInAwaitSet(tid)) {
auto* task = GetTaskById(tid);
if (task) {
task->BindResumeObject(std::forward
(awaitObj));
AddToImmRun(task);
}
OnTaskAwaitNotifyFinish(tid);
}
}Example: full‑featured coroutine task
The article presents a large example that creates a task, yields to the next frame, performs loops with Sleep , spawns child coroutines, waits for their completion, performs an RPC call via an awaitable RpcRequest , and finally returns a value using co_return :
mScheduler.CreateTask20([clientProxy]() -> rstudio::logic::CoResumingTaskCpp20 {
auto* task = rco_self_task();
printf("step1: task %llu\n", task->GetId());
co_await rstudio::logic::cotasks::NextFrame{};
printf("step2 after yield!\n");
int c = 0;
while (c < 5) {
printf("in while loop c=%d\n", c);
co_await rstudio::logic::cotasks::Sleep(1000);
++c;
}
for (c = 0; c < 5; ++c) {
printf("in for loop c=%d\n", c);
co_await rstudio::logic::cotasks::NextFrame{};
}
printf("step3 %d\n", c);
auto newTaskId = co_await rstudio::logic::cotasks::CreateTask(false, []() -> rstudio::logic::CoResumingTaskCpp20 {
printf("from child coroutine!\n");
co_await rstudio::logic::cotasks::Sleep(2000);
printf("after child coroutine sleep\n");
});
printf("new task create in coroutine: %llu\n", newTaskId);
co_await rstudio::logic::cotasks::WaitTaskFinish{newTaskId, 10000};
rstudio::logic::cotasks::RpcRequest rpcReq{clientProxy, "DoHeartBeat", rstudio::reflection::Args{3}, 5000};
auto* rpcret = co_await rpcReq;
if (rpcret->rpcResultType == rstudio::network::RpcResponseResultType::RequestSuc) {
assert(rpcret->totalRet == 1);
int retval = rpcret->retValue.to
();
assert(retval == 4);
printf("rpc coroutine run suc, val = %d!\n", retval);
} else {
printf("rpc coroutine run failed! result = %d \n", (int)rpcret->rpcResultType);
}
co_await rstudio::logic::cotasks::Sleep(5000);
printf("step4, after 5s sleep\n");
co_return rstudio::logic::CoNil;
});The output demonstrates the sequential execution, child task handling, and RPC result processing.
Comparison and advantages
C++20 coroutines use native language support, eliminating the need for complex macro tricks.
Stack variables are automatically preserved, reducing mental overhead.
Awaitable objects provide strong type safety and easy extensibility.
The scheduler integrates seamlessly with both C++17 and C++20 tasks, offering flexible await modes.
Finally, the article outlines a roadmap that includes further scheduler enhancements, integration with execution frameworks (e.g., libunifex), and continued exploration of structured concurrency in C++.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.