Design and Optimization Practices of 58.com Distributed KV Storage System (WTable)
The article details the three‑stage evolution of 58.com’s distributed key‑value storage system WTable, covering its initial architecture, performance and operational optimizations, and the introduction of NewSQL‑style strong consistency and Raft‑based replication to improve scalability and reliability.
Background After several years of development, 58.com has built four distributed storage products (Key‑Value, Key‑List, Object Storage, Shared File System). The earliest and most widely used is the KV store WTable, which has undergone three major development phases since its first release in 2015.
WTable Phase 1 – To replace MySQL+Cache, WTable was designed for easy scaling, high availability, and performance. It uses ETCD as a configuration center and leader election, RocksDB as the storage engine, and a proxy layer that routes requests to groups of storage nodes (typically one master and two replicas). Data is sharded into 2048 slots; each slot maps to a group, and groups can be added to expand capacity. Multi‑level disaster recovery includes health checks, blacklisting of faulty proxies, and automatic master election via ETCD.
WTable Phase 2 – Focus shifted to performance tuning and operational efficiency. RocksDB was upgraded from 3.11 to 5.15 and its parameters tuned, yielding significant read/write latency improvements. Compaction throttling (AutoTune) was applied to limit I/O spikes. Checkpoint support replaced costly KV‑iteration for adding backup nodes, reducing I/O to a few hundred milliseconds. Data expansion was re‑engineered to migrate slots in bulk rather than one‑by‑one, and deletions were accelerated using DeleteFileInRange and CompactRange+CompactFilter.
WTable Phase 3 – Introduced NewSQL concepts to provide strong consistency. The system adopted a Raft‑based replication protocol (initially MultiRaft, later the DragonBoat library) for each slot, eliminating the need for ETCD‑based leader election. Expansion now simply adds observer nodes that sync data before becoming followers. The architecture retains 2048 slots, each managed by a Raft group, with proxies caching routing information updated via ETCD. Future plans include distributed transactions and full SQL support, effectively delivering a lightweight NewSQL platform.
Conclusion The iterative design of WTable demonstrates a balanced approach to addressing business requirements, operational constraints, and resource limits, evolving from basic disaster recovery and performance to sophisticated consistency and user‑experience enhancements.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.