Abase: ByteDance’s Large‑Scale Online KV Storage System – Architecture, High Availability, and Key Technologies
This article introduces Abase, ByteDance’s massive online KV storage system, detailing its evolution from a single‑cluster KV service to a multi‑region, multi‑tenant platform, and explains the high‑availability challenges and the leaderless multi‑write architecture, hybrid logical clocks, quorum settings, and performance optimizations that enable hundred‑billion QPS and sub‑10 ms latency.
Abase is ByteDance’s core online KV storage system, serving over 90% of the company’s KV needs across products such as recommendation, search, advertising, e‑commerce, Douyin, Feishu, and more. Launched in 2016, it grew from a single‑cluster KV service to a multi‑region, multi‑tenant platform handling billions of requests and petabytes of data.
The first generation faced scalability and availability limits, prompting the development of Abase 2.0, which focuses on extreme high availability. The new architecture adopts a leaderless multi‑write design, eliminating the downtime associated with master‑slave failover and reducing the impact of slow nodes.
Key technical components include:
Multi‑write architecture that writes concurrently to multiple replicas, avoiding single‑point‑of‑failure master nodes.
Hybrid logical clocks combined with physical location information to generate globally unique timestamps, enabling a Last‑Write‑Wins conflict resolution strategy.
Support for both multi‑tenant and serverless deployment models, allowing fine‑grained resource isolation and cost reduction.
Configurable quorum and consistency levels (W and R values) so users can trade off durability for latency or throughput.
Region‑aware POD abstraction that distributes replicas across different data‑center rooms, ensuring resilience against localized failures such as power loss or fire.
Integration with existing ecosystems (Redis protocol compatibility, Hive bulk load, RocksDB and a custom hash engine) and a two‑layer engine that merges multi‑version data into a single version for fast point‑lookups.
Performance metrics show a QPS in the hundred‑billion range, data volume exceeding hundreds of petabytes, and a P99 latency under 50 ms (10 ms for high‑priority clusters). Bottlenecks are addressed through read‑write separation, multi‑version merging, and optional use of persistent memory (PMem) for small‑value workloads.
The Q&A section confirms the system’s scalability, discusses future plans such as native PMem support, and highlights the flexibility of the architecture to accommodate diverse workload requirements.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.