Optimizing RocksDB Compaction Rate Limiting to Reduce IO Spikes in WTable
This article analyzes RocksDB's compaction rate‑limiting source code and presents practical tuning methods—both fixed and auto‑tuned—to mitigate IO spikes in the distributed KV store WTable, improving real‑time read/write latency and stability.
Background : The 58 storage team’s distributed KV store WTable uses the open‑source RocksDB engine, which employs LSM‑Tree structures and background compaction. Heavy write workloads trigger extensive compaction, consuming high IO and causing IO spikes that increase latency for real‑time operations.
Source Code Analysis : RocksDB implements rate limiting via the RateLimiter class, which has five parameters: rate_bytes_per_sec , refill_period_us , fairness , mode , and auto_tuned . In a normal limiter, tokens are replenished every refill_period_us (default 100 ms), and requests obtain tokens based on availability; excess requests wait in low‑priority and high‑priority queues, with a leader thread handling token refills.
The refill process updates available_bytes_ , distributes tokens according to the fairness probability, and wakes waiting threads. After each refill, a new leader is elected.
Auto‑Tune Rate Limiter : When auto_tuned is enabled, the limiter dynamically adjusts the upper bound of rate_bytes_per_sec every 100 refill periods based on the proportion of periods where available_bytes_ was exhausted. The adjustment range is [ rate_bytes_per_sec/20 , rate_bytes_per_sec ]; thresholds are lowered or raised by a factor of 1.05 depending on low (50%) and high (90%) watermarks.
Limiting Practice – Ordinary Rate Limiting : Initially, WTable used a fixed limiter with rate_bytes_per_sec set to 250 MB/s for a specific cluster, which reduced IO spikes to around 50 %.
Limiting Practice – Auto‑Tune : Fixed limits are not universal; larger write loads caused write stalls. Enabling auto‑tune with rate_bytes_per_sec set to 1000 MB/s provided a lower bound of 50 MB/s, effectively limiting IO spikes while allowing dynamic adjustment, making the configuration applicable across diverse clusters.
Other Considerations : Reducing compaction frequency by increasing SST file size can also alleviate IO spikes, though it may increase space amplification. The team is exploring direct SST insertion to further mitigate offline import impact.
References : https://github.com/facebook/rocksdb/wiki
Author : Jiang Shouchao, Senior Storage Engineer at 58 Group, responsible for WTable development and optimization.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.