Databases 16 min read

How We Turned RocksDB into a High‑Performance Coroutine Engine

By manually modifying a few hundred lines and using an automated script, we transformed the multithreaded RocksDB storage engine into a coroutine‑based version with PhotonLibOS, achieving near‑identical functionality and, in heavy I/O and high‑concurrency scenarios, up to double the throughput compared to the original.

Alibaba Cloud Developer

Dec 14, 2022

How We Turned RocksDB into a High‑Performance Coroutine Engine

Abstract: By using a small amount of manual modification combined with automatic code conversion, we rewrote a large multithreaded program into coroutines, achieving performance doubling in certain heavy‑IO, high‑concurrency scenarios.

Background

RocksDB is a widely used embedded persistent KV database that employs a log‑structured storage engine optimized for fast, low‑latency storage devices. Written in C++, it is mature, well‑tested, and provides extensive performance testing tools, making it a core subject for storage and low‑level system engineers.

RocksDB uses a multithreaded model supporting concurrent reads and writes. Compared with threads, coroutines are lighter and more efficient for I/O‑heavy or highly concurrent workloads; thread context switches can take up to 30 µs, while coroutine switches can be as low as a few tens of nanoseconds.

PhotonLibOS (Photon) is an open‑source high‑performance coroutine library and I/O engine from Alibaba Cloud's DADI team. After benchmarking Photon‑based I/O and network programs against fio and Nginx, the team decided to explore coroutine‑based refactoring of RocksDB, marking the first large‑scale mature software integration with Photon.

Coroutine Transformation

Conclusion: The transformation proceeded smoothly with only about 200 lines of manual changes, followed by a script that automatically converted the rest to coroutine code, allowing successful compilation and execution.

Using RocksDB 6.1.2 (2019) with 3,175 test cases, the Photon coroutine version passed 3,170 cases (99.87% success). The five failures involved thread‑specific features or tests that explicitly required a thread environment, which do not affect normal RocksDB operation.

Performance: Using the built‑in db_bench tool across four typical KV read/write workloads, the coroutine version achieved comparable OPS to the original, and in some heavy‑IO, high‑concurrency scenarios it outperformed the thread‑based version.

Photon Library Introduction

1. Concurrency Model

Common concurrency models include multithreading, asynchronous callbacks, stackful coroutines, and stackless coroutines. Photon implements stackful coroutines and names them "thread"; multiple Photon threads run on a single virtual CPU (vcpu), which maps to an OS thread. Each vcpu executes on one core at a time, and migration across cores is invisible to the coroutine.

Photon’s design treats coroutines as lightweight threads and aligns its API with POSIX and C++ standards, making it hard for developers to distinguish between a multithreaded program and a coroutine program without explicit hints.

2. Asynchronous Event Engine

Each Photon vcpu contains an asynchronous event engine. Events originate from explicit coroutine yields, vcpu migration or wake‑up, I/O events on file descriptors, timer expirations, etc. Photon supports multiple async engines such as epoll, io_uring, and kqueue; io_uring is recommended on Linux kernels 5.x and above, allowing batch I/O submission and completion with a single system call, reducing syscall overhead.

Unlike epoll, io_uring natively supports asynchronous file I/O while presenting a synchronous‑style API, eliminating the need for libaio registration, callbacks, or memory alignment. This simplification eased the replacement of RocksDB’s synchronous psync I/O calls.

3. Sync, Locks, and Atomics

Photon’s mutex and semaphore implementations follow POSIX designs but are adapted for coroutine contexts. Internally they resemble user‑space futexes, managing wait queues with linked lists. Atomic operations behave the same for threads and coroutines; however, if a variable is only accessed within a single vcpu, atomicity is unnecessary because the vcpu is thread‑safe.

4. Transformation Steps

Step 1: Replace all standard C++ threading and synchronization primitives with Photon equivalents. Example:

bool condition = false;
std::mutex mu;
std::condition_variable cv;

new std::thread([&] {
    std::this_thread::sleep_for(std::chrono::seconds(1));
    std::lock_guard<std::mutex> lock(mu);
    condition = true;
    cv.notify_one();
});

std::unique_lock<std::mutex> lock(mu);
while (!condition) {
    cv.wait(lock);
}

After conversion:

bool condition = false;
photon::std::mutex mu;
photon::std::condition_variable cv;

new photon::std::thread([&] {
    photon::std::this_thread::sleep_for(std::chrono::seconds(1));
    photon::std::lock_guard<photon::std::mutex> lock(mu);
    condition = true;
    cv.notify_one();
});

photon::std::unique_lock<photon::std::mutex> lock(mu);
while (!condition) {
    cv.wait(lock);
}

Step 2: Remove thread‑specific calls such as pthread_setname_np and syscalls that adjust thread I/O priority.

Step 3: Replace thread_local variables with photon::thread_local_ptr to provide coroutine‑local storage. Example:

// Original thread_local variable
thread_local Value value = "123";

// Photon replacement
static photon::thread_local_ptr<Value, std::string> value("123");

These changes ensure that the coroutine version retains the original logic while leveraging Photon’s lightweight scheduling.

db_bench Single‑Node Performance Test

We forked RocksDB on GitHub, added ~200 lines of Photon‑related changes, and built a 6.1.2 branch. Tests were run on a high‑end cloud VM (Linux 6.x kernel, GCC 8) with 10 M keys, cold load, 1‑minute duration, and OPS/s reported.

Results show comparable performance for read and synchronous write workloads; however, when synchronous writes are disabled, the coroutine version lags because the workload becomes CPU‑bound and the coroutine overhead is not fully amortized.

Additional performance gaps stem from the lack of targeted optimizations in the coroutine build, such as replacing busy‑wait asm volatile("pause") with coroutine‑aware sleeps or adapting the core_local module for coroutine contexts.

Killer Feature: Coroutine‑Based Network Database

Although single‑node tests show modest gains, the true value of coroutine‑based RocksDB emerges in networked, high‑concurrency scenarios. Traditional epoll loops limit scalability; Photon’s coroutine model can handle millions of concurrent connections with far fewer OS threads.

RocksDB’s built‑in group commit merges multiple requests into a single I/O operation, so higher concurrency yields better throughput. In a benchmark with an RPC server handling 1,000 concurrent clients, the coroutine version achieved roughly twice the OPS of the thread‑pool version while using only eight vcpus.

Conclusion

By integrating PhotonLibOS, we converted a large‑scale database into a coroutine‑driven system with minimal code changes, confirming the theoretical advantages of coroutines in heavy I/O and high‑concurrency environments and demonstrating Photon’s maturity as a storage‑acceleration solution.

Further work is needed to fine‑tune CPU‑bound paths, adapt low‑level modules like core_local, and explore deeper optimizations to fully unleash the potential of coroutine‑based RocksDB.

PhotonLibOS source: https://github.com/alibaba/PhotonLibOS

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High concurrency RocksDB coroutine PhotonLibOS

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.