Databases 16 min read

How Baidu’s PegaDB Redefines Redis with Low‑Cost, High‑Capacity Storage

This article details Baidu Cloud's PegaDB—a Redis‑compatible, high‑capacity, low‑cost distributed KV store—covering its design choices, architecture, performance and replication optimizations, multi‑region active‑active support, native JSON model, community contributions, and future roadmap.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How Baidu’s PegaDB Redefines Redis with Low‑Cost, High‑Capacity Storage

PegaDB Overview

PegaDB is a fully Redis‑compatible, high‑capacity, low‑cost distributed key‑value database developed by Baidu Cloud to address the high memory cost and limited capacity of traditional Redis deployments. It delivers about 70% of Redis performance while costing less than 20% per GB.

Key Features

Complete Redis protocol compatibility for seamless migration.

Horizontal scaling to petabyte‑level storage using SSDs.

Cost reduction of over 80% per GB compared with in‑memory Redis.

Millisecond‑level online data processing.

Active‑active multi‑region architecture with disaster‑recovery capabilities.

Enterprise‑grade features such as tunable consistency, hot/cold data separation, and native JSON support.

Typical Use Cases

Large‑scale data scenarios where Redis storage costs are prohibitive, open‑source KV databases that cannot meet performance or functionality requirements, and hot/cold separation patterns that complicate traditional Cache + DB architectures.

PegaDB is already deployed in core Baidu services such as Fengchao, Feed, Shoubei, Map, and Dumi.

Design and Implementation

Background

Redis’s in‑memory nature leads to high storage costs and a per‑cluster capacity ceiling of about 4 TB, which cannot satisfy Baidu’s massive data needs.

Industry Solutions

Three main categories of Redis‑compatible KV solutions exist: disk‑based systems like Pika/Kvrocks, TiKV‑based systems like Meitu Titan/Tedis, and hybrid approaches like Redis On Flash. Each suffers from scalability, compatibility, or performance limitations.

Design Choice

Baiju selected Kvrocks as the upstream project for further development due to its code simplicity and alignment with Baidu’s requirements.

Kvrocks Introduction

Kvrocks is a distributed KV store built on RocksDB that fully implements the Redis protocol, aiming to solve Redis’s memory cost and capacity constraints.

Kvrocks architecture
Kvrocks architecture

Cluster Design

PegaDB adopts a Redis‑Cluster‑like slot allocation strategy with a centralized MetaServer managing cluster metadata, enabling elastic scaling and supporting both fixed‑slot and dynamic topology changes.

Cluster topology
Cluster topology

Scaling and Rebalancing

Data migration uses RocksDB snapshots for full‑copy and WAL logs for incremental copy, moving slots between nodes while minimizing service disruption with short, millisecond‑level write pauses.

Scaling workflow
Scaling workflow

Replication Optimizations

PegaDB introduces a Replication ID and monotonic Sequence ID stored in the WAL, enabling partial resynchronization after failover and supporting half‑sync replication with configurable sync replica counts.

Replication diagram
Replication diagram

Performance Tuning

Extensive engine optimizations include rate‑limited compaction, partitioned indexes, multi‑CF block caches, enable_pipelined_write, and GC pre‑read. Hot‑key caching provides million‑level hot‑key access per node, reducing cache‑DB consistency overhead.

Performance optimizations
Performance optimizations

Active‑Active Multi‑Region Architecture

SyncAgent components co‑located with PegaDB instances replicate data across regions using ShardID to prevent loops and OpID for resumable transfers; conflicts are resolved with a simple Last‑Write‑Wins policy.

Multi‑region architecture
Multi‑region architecture

PJSON Data Model

PegaDB natively supports a JSON data model compatible with RedisJSON, offering JSONPath queries, atomic operations on all JSON value types, and compact encoding that benefits hot‑key caching.

ZSET & HASH Enhancements

Additional commands provide aggregation and result filtering for ZSETs and range operations for HASHes.

Open‑Source Collaboration

The Baidu team actively contributes to the Kvrocks project, delivering PRs for replication, transactions, storage engine, and clustering, and helped Kvrocks become an Apache incubating project.

Future Roadmap

Release a serverless offering to improve elasticity.

Integrate more Redis modules for richer data models.

Provide connectors for seamless big‑data ecosystem integration.

Continue performance enhancements via io_uring and thread‑model optimizations.

Redisdistributed databasekey-value storePegaDBLow-Cost Storage
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.