Fundamentals 19 min read

Why We Built Our Own C++ Coroutine Framework and How It Boosts Development Efficiency

This article explains the motivation behind creating the C++ coroutine framework "owl" for the cross‑platform WeChat client, compares callback, promise, and coroutine approaches with code examples, and details its design choices such as stackful coroutines, single‑thread scheduling, structured concurrency, and performance characteristics.

WeChat Client Technology Team

Oct 21, 2021

Why We Built Our Own C++ Coroutine Framework and How It Boosts Development Efficiency

Background

Because many basic components of the WeChat client are written in C++ for cross‑platform reasons, the traditional asynchronous programming model could no longer meet the growing complexity. To improve development efficiency and code quality, we built a C++ coroutine framework called owl that provides a unified programming model for all core components. owl is currently used in the C++ cross‑platform WeChat client kernel (Alita). All business logic in Alita is implemented with coroutines, reducing code size by at least 50 % compared with the old async model. It powers children‑watch WeChat, Linux car‑machine WeChat, Android car‑machine WeChat, etc., and the UI logic of the Linux car‑machine version is also fully coroutine‑based.

Why Build Our Own?

Although C++20 supports coroutines and open‑source solutions such as libco and libgo exist, none satisfies our requirements:

We need to support many OSes (Android, iOS, macOS, Windows, Linux, RTOS) and architectures (x86, x86_64, arm, arm64, loongarch64). Existing libraries do not cover this matrix.

When the project started in early 2019 C++20 was not mature; most compilers in‑house and among partners still target C++14, so owl can only rely on C++14 features.

Most third‑party solutions are designed for backend services, support only Linux/x86, expose only low‑level APIs and are not framework‑level.

Show Me the Code

Below we compare three ways of writing an asynchronous addition of one.

1. Callback

void AsyncAddOne(int value, std::function<void(int)> callback) {
    std::thread t([value, callback = std::move(callback)]{
        std::this_thread::sleep_for(100ms);
        callback(value + 1);
    });
    t.detach();
}

AsyncAddOne(100, [](int result){ /* … */ });

Callback style suffers from “callback hell”, error handling, and lifecycle management.

2. Promise

// Convert AsyncAddOne to a promise
owl::promise AsyncAddOnePromise(int value) {
    return owl::make_promise([=](auto d){
        AsyncAddOne(value, [=](int result){ d.resolve(result); });
    });
}

AsyncAddOnePromise(100)
    .then([](int r){ return AsyncAddOnePromise(r); })
    .then([](int r){ printf("result %d
", r); });

Promises eliminate callback hell but become unwieldy for complex control flow, so we abandoned them.

3. Coroutine

owl::promise2<int> AsyncAddOnePromise2(int value) {
    return owl::make_promise2<int>([=](auto d){
        AsyncAddOne(value, [=](int result){ d.resolve(result); });
    });
}

owl::co_launch([]{
    int v = 100;
    for (int i = 0; i < 3; ++i) {
        v = co_await AsyncAddOnePromise2(v);
    }
    printf("result %d
", v);
});

Coroutines let us write asynchronous code in a synchronous style using co_await, greatly reducing mental overhead.

Callback‑to‑Coroutine

By first converting a callback‑based API to a promise and then awaiting it, the original callback can be used inside a coroutine with just a few lines:

void AsyncAddOne(int value, std::function<void(int)> callback);
owl::promise2<int> AsyncAddOnePromise2(int value);
auto v = co_await AsyncAddOnePromise2(100);

Framework Layers

Coroutine Design

Stackful vs Stackless

Stackful coroutines have their own call stack, similar to threads.

Stackless coroutines keep state in a state machine; most language‑level coroutines are stackless, but owl uses stackful coroutines.

Independent vs Shared Stack

Independent stack : each coroutine owns a separate stack.

Shared stack : multiple coroutines share one stack, saving memory but requiring careful state saving.

For terminal development we choose independent stacks.

Scheduler

1:N (single‑thread) scheduler: one thread drives many coroutines, no locking needed.

M:N (multi‑thread) scheduler: multiple threads drive coroutines, requiring locks and TLS handling. owl uses a single‑thread scheduler based on a RunLoop message loop, which allows coroutine code to interact with UI without extra synchronization.

class executor {
public:
    virtual ~executor() {}
    virtual uint64_t post(std::function<void()> closure) = 0;
    virtual uint64_t post_delayed(unsigned delay, std::function<void()> closure) = 0;
    virtual void cancel(uint64_t id) {}
};

Inter‑coroutine Communication

owl

adopts the CSP model and provides channels for communication; no coroutine‑level lock is offered.

Structured Concurrency

Coroutines are treated as scopes: a parent coroutine must wait for its children and cancellation propagates automatically. Example:

class SimpleActivity {
public:
    SimpleActivity() { scope_.set_exec(GetUiExecutor()); }
    ~SimpleActivity() { scope_.cancel(); }
    void OnButtonClicked() {
        scope_.co_launch([=]{
            auto p1 = owl::co_async([]{ return DownloadImage(...); });
            auto p2 = owl::co_async([]{ return DownloadImage(...); });
            auto img1 = co_await p1;
            auto img2 = co_await p2;
            auto new_image = co_await AsyncCombineImage(img1, img2);
            image_->SetImage(new_image);
        });
    }
private:
    owl::co_scope scope_;
    ImageLabel* image_;
};

Performance

Context switch using the Context API costs 20‑30 ns, coroutine switch with the single‑thread scheduler costs 0.5‑3 µs, and pthread thread switch costs 2‑8 µs. Although coroutine switching is slower than raw context switches, it is sufficient for terminal use.

Conclusion

Since its adoption, owl has significantly improved development efficiency and code quality. It is widely used internally at Tencent, and after further stabilization it will be open‑sourced.

Asynchronous C++framework Structured Concurrency coroutine single-thread scheduler

Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.