How Baidu’s PegaDB Redefines Redis with Low‑Cost, High‑Capacity KV Storage
This article summarizes Liu Donghui’s presentation at DTCC2022, detailing Baidu Intelligent Cloud’s Redis‑compatible, high‑capacity, low‑cost PegaDB, covering its design goals, architecture, KV storage engine choices, cluster scaling, replication enhancements, performance optimizations, multi‑region active‑active support, and future roadmap.
Overview
Redis is widely used within Baidu for high‑traffic and acceleration scenarios, but its in‑memory nature leads to high cost. To address this, Baidu developed a Redis‑compatible, large‑capacity, low‑cost product called PegaDB (Baidu Intelligent Cloud Redis Capacity Edition). In simple KV workloads, PegaDB achieves about 70% of Redis performance while reducing per‑GB cost by over 80%.
Key Features of PegaDB
Full compatibility with Redis, enabling seamless migration.
Horizontal scalability with petabyte‑level storage per cluster.
SSD‑based architecture, cutting per‑GB cost by >80% compared to Redis.
Millisecond‑level online data processing.
Multi‑region active‑active architecture providing cross‑region disaster recovery.
Enterprise‑grade capabilities such as tunable consistency, hot‑cold separation, and native JSON data model.
Typical Application Scenarios
Large‑scale data storage where Redis costs are prohibitive, KV workloads that exceed the capacity limits of open‑source solutions, and hot‑cold separation scenarios that would otherwise require complex Cache + DB architectures. PegaDB is already deployed in core Baidu services such as Fengchao, Feed, Shoubei, Map, and Dumi.
Design Background
The motivation for PegaDB stemmed from Redis’s high memory cost and limited cluster capacity (public‑cloud max 4 TB). Baidu needed a solution that offered large capacity, low cost, Redis compatibility, and the typical distributed storage attributes of high performance, high availability, and scalability.
Industry Solutions Overview
Three main categories of Redis‑compatible KV databases exist:
Disk‑based designs like Pika and Kvrocks, built on RocksDB but lacking mature clustering and multi‑active support.
Disk‑based designs such as Meitu Titan and Tedis, built on TiKV, with limited Redis compatibility and similar scalability issues.
Redis On Flash, a hybrid memory‑disk approach that separates hot and cold data but suffers from limited generality and poor performance for large‑value workloads.
Design Choice
Considering development resources and time‑to‑market, Baidu chose to base PegaDB on the open‑source Kvrocks project, contributing back to the community and extending it for Baidu’s needs.
Kvrocks Overview
Kvrocks, developed by Meitu, is a distributed KV database that uses RocksDB as its storage engine and fully compatible with the Redis protocol. It addresses Redis’s memory cost and capacity limits.
Limitations of Kvrocks for Baidu
No horizontal scaling – cannot handle tens or hundreds of terabytes.
Performance degradation for large‑value and hot‑cold separation workloads.
Asynchronous replication limits consistency and multi‑region disaster recovery.
Lacks support for Redis 4.0+ commands, transactions, Lua scripts, and multi‑DB features.
Cluster Architecture
PegaDB adopts a Redis‑Cluster‑style slot allocation and a centralized MetaServer to manage cluster metadata, ensuring compatibility with Redis‑Cluster SDKs while avoiding a heavy proxy layer.
Scaling (Expansion & Shrinkage) Design
Data is distributed across a fixed number of slots. Migration follows a two‑phase approach: full data migration using RocksDB snapshots and incremental migration leveraging the engine’s WAL. Slot‑level concurrent migration and Delete‑Range optimizations improve efficiency, with brief millisecond‑level write pauses during topology changes.
Master‑Slave Replication Optimizations
PegaDB introduces a Replication ID and monotonic Sequence ID for each write. Only replicas with matching Replication ID and a lower Sequence ID can resynchronize, enabling partial resync after failover or restart. A half‑sync replication mode provides stronger consistency, configurable synchronous replica counts, and timeout handling.
Performance Optimizations
Storage engine tuning includes adopting the WiscKey‑style key‑value separation (via TianDB, later migrated to BlobDB), rate‑limited compaction, partitioned indexes, and extensive RocksDB configuration (e.g., global filters, hash indexes, disabled L0/L1 compression). Write path improvements involve pipelined writes, sync‑file‑range, and GC pre‑reading. Cache layer enhancements add a hot‑key cache with fine‑grained granularity, outperforming block and row caches.
JSON Data Model
PegaDB natively supports a RedisJSON‑compatible JSON data model, allowing direct storage and manipulation of JSON documents without serialization overhead, with full JSONPath query and update capabilities and compact encoding.
Command Enhancements
Additional features include aggregated ZSET operations with result filtering and range queries for HASH types.
Open‑Source Community Collaboration
Since its inception, PegaDB has actively contributed to the Kvrocks community, delivering PRs for replication, transaction, storage engine, and clustering improvements, and helping Kvrocks graduate to an Apache incubator project.
Future Roadmap
Planned enhancements: leveraging cloud infrastructure for serverless offerings, expanding Redis‑Module‑style data models, providing connectors for big‑data ecosystems, and further performance gains via io_uring and thread‑model optimizations.
Speaker
Liu Donghui – Senior R&D Engineer at Baidu Intelligent Cloud, technical lead of the Redis kernel team and core member of the Kvrocks project.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
