Why We Built Our Own C++ Coroutine Framework and How It Boosts Development Efficiency
This article explains the motivation behind creating the C++ coroutine framework "owl" for the cross‑platform WeChat client, compares callback, promise, and coroutine approaches with code examples, and details its design choices such as stackful coroutines, single‑thread scheduling, structured concurrency, and performance characteristics.
Background
Because many basic components of the WeChat client are written in C++ for cross‑platform reasons, the traditional asynchronous programming model could no longer meet the growing complexity. To improve development efficiency and code quality, we built a C++ coroutine framework called owl that provides a unified programming model for all core components. owl is currently used in the C++ cross‑platform WeChat client kernel (Alita). All business logic in Alita is implemented with coroutines, reducing code size by at least 50 % compared with the old async model. It powers children‑watch WeChat, Linux car‑machine WeChat, Android car‑machine WeChat, etc., and the UI logic of the Linux car‑machine version is also fully coroutine‑based.
Why Build Our Own?
Although C++20 supports coroutines and open‑source solutions such as libco and libgo exist, none satisfies our requirements:
We need to support many OSes (Android, iOS, macOS, Windows, Linux, RTOS) and architectures (x86, x86_64, arm, arm64, loongarch64). Existing libraries do not cover this matrix.
When the project started in early 2019 C++20 was not mature; most compilers in‑house and among partners still target C++14, so owl can only rely on C++14 features.
Most third‑party solutions are designed for backend services, support only Linux/x86, expose only low‑level APIs and are not framework‑level.
Show Me the Code
Below we compare three ways of writing an asynchronous addition of one.
1. Callback
void AsyncAddOne(int value, std::function<void(int)> callback) {
std::thread t([value, callback = std::move(callback)]{
std::this_thread::sleep_for(100ms);
callback(value + 1);
});
t.detach();
}
AsyncAddOne(100, [](int result){ /* … */ });Callback style suffers from “callback hell”, error handling, and lifecycle management.
2. Promise
// Convert AsyncAddOne to a promise
owl::promise AsyncAddOnePromise(int value) {
return owl::make_promise([=](auto d){
AsyncAddOne(value, [=](int result){ d.resolve(result); });
});
}
AsyncAddOnePromise(100)
.then([](int r){ return AsyncAddOnePromise(r); })
.then([](int r){ printf("result %d
", r); });Promises eliminate callback hell but become unwieldy for complex control flow, so we abandoned them.
3. Coroutine
owl::promise2<int> AsyncAddOnePromise2(int value) {
return owl::make_promise2<int>([=](auto d){
AsyncAddOne(value, [=](int result){ d.resolve(result); });
});
}
owl::co_launch([]{
int v = 100;
for (int i = 0; i < 3; ++i) {
v = co_await AsyncAddOnePromise2(v);
}
printf("result %d
", v);
});Coroutines let us write asynchronous code in a synchronous style using co_await, greatly reducing mental overhead.
Callback‑to‑Coroutine
By first converting a callback‑based API to a promise and then awaiting it, the original callback can be used inside a coroutine with just a few lines:
void AsyncAddOne(int value, std::function<void(int)> callback);
owl::promise2<int> AsyncAddOnePromise2(int value);
auto v = co_await AsyncAddOnePromise2(100);Framework Layers
Coroutine Design
Stackful vs Stackless
Stackful coroutines have their own call stack, similar to threads.
Stackless coroutines keep state in a state machine; most language‑level coroutines are stackless, but owl uses stackful coroutines.
Independent vs Shared Stack
Independent stack : each coroutine owns a separate stack.
Shared stack : multiple coroutines share one stack, saving memory but requiring careful state saving.
For terminal development we choose independent stacks.
Scheduler
1:N (single‑thread) scheduler: one thread drives many coroutines, no locking needed.
M:N (multi‑thread) scheduler: multiple threads drive coroutines, requiring locks and TLS handling. owl uses a single‑thread scheduler based on a RunLoop message loop, which allows coroutine code to interact with UI without extra synchronization.
class executor {
public:
virtual ~executor() {}
virtual uint64_t post(std::function<void()> closure) = 0;
virtual uint64_t post_delayed(unsigned delay, std::function<void()> closure) = 0;
virtual void cancel(uint64_t id) {}
};Inter‑coroutine Communication
owladopts the CSP model and provides channels for communication; no coroutine‑level lock is offered.
Structured Concurrency
Coroutines are treated as scopes: a parent coroutine must wait for its children and cancellation propagates automatically. Example:
class SimpleActivity {
public:
SimpleActivity() { scope_.set_exec(GetUiExecutor()); }
~SimpleActivity() { scope_.cancel(); }
void OnButtonClicked() {
scope_.co_launch([=]{
auto p1 = owl::co_async([]{ return DownloadImage(...); });
auto p2 = owl::co_async([]{ return DownloadImage(...); });
auto img1 = co_await p1;
auto img2 = co_await p2;
auto new_image = co_await AsyncCombineImage(img1, img2);
image_->SetImage(new_image);
});
}
private:
owl::co_scope scope_;
ImageLabel* image_;
};Performance
Context switch using the Context API costs 20‑30 ns, coroutine switch with the single‑thread scheduler costs 0.5‑3 µs, and pthread thread switch costs 2‑8 µs. Although coroutine switching is slower than raw context switches, it is sufficient for terminal use.
Conclusion
Since its adoption, owl has significantly improved development efficiency and code quality. It is widely used internally at Tencent, and after further stabilization it will be open‑sourced.
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
