Databases 26 min read

Meituan's Million‑Level KV Storage Architecture and Practices: Squirrel (In‑Memory) and Cellar (Persistent)

Meituan’s KV platform evolved from Memcached and Redis to its own high‑throughput in‑memory Squirrel and durable Cellar systems, each partitioning keys into 16 384 slots, using ZooKeeper routing, Kubernetes scaling, intelligent migration, cross‑region Raft replication, multi‑queue latency isolation, and hotspot management, while planning further optimizations such as Redis‑Gossip tuning, kernel‑bypass I/O, and FPGA acceleration.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Meituan's Million‑Level KV Storage Architecture and Practices: Squirrel (In‑Memory) and Cellar (Persistent)

Meituan's KV storage is a critical online storage service that handles trillions of requests per day. The article summarizes the evolution, architecture, and operational practices of Meituan's KV systems, presented originally at QCon 2019.

Evolution of Meituan KV Storage – The first generation used client‑side consistent hashing with many Memcached instances, which suffered from data loss on node failures and scaling issues. The second generation adopted Redis master‑slave clusters with Sentinel for failover, but still faced data loss during rebalancing. In 2014 Meituan introduced Alibaba's open‑source Tair to improve routing and data migration, yet encountered problems such as lack of distributed arbitration and limited data structures. Consequently, Meituan built two self‑developed solutions: the in‑memory, high‑throughput KV system **Squirrel** and the persistent, high‑capacity KV system **Cellar**.

Squirrel Architecture and Practices – Both Squirrel and Cellar pre‑partition the key space into 16,384 slots. A slot ID is derived from a hash of the key and mapped to a storage node via a routing table stored in ZooKeeper. High availability is achieved through a cluster scheduler that manages scaling, failover, and topology updates. Node failure detection triggers rapid removal (≈5 s) and automatic container provisioning via Kubernetes. Cross‑region replication copies data from a Beijing master cluster to a Shanghai slave cluster, using a dedicated sync service that converts Redis replication streams into write commands for the remote cluster.

Squirrel also implements intelligent migration that selects migration tasks based on proximity, runs concurrent migrations across multiple nodes, and dynamically throttles speed based on real‑time success‑rate and load metrics, similar to TCP slow‑start. An asynchronous MIGRATE command prevents main‑thread blocking for large values, returning errors for concurrent writes to migrating keys.

For persistence, Squirrel redesigns Redis's RDB/AOF workflow: writes first go to an in‑memory backlog, then an asynchronous thread flushes changes to disk. During low‑traffic periods a full RDB snapshot is taken, and the backlog is cleared. Recovery uses the in‑memory backlog, then the disk backlog, and finally a full‑snapshot if needed, reducing full‑copy overhead.

Cellar Architecture and Practices – Cellar replaces Tair's central node with an OB service for routing‑table queries and uses ZooKeeper for distributed arbitration. Node failover employs a Handoff mechanism that temporarily redirects traffic to a healthy replica and logs writes for later replay, enabling sub‑second removal and seamless reintegration after recovery or upgrade.

Cross‑region replication mirrors Squirrel's approach, establishing a bidirectional sync chain between Beijing and Shanghai clusters while minimizing bandwidth usage. Strong consistency is achieved with Raft: each slot forms a Raft group with three replicas, and a Multi‑Raft design consolidates logs and replication threads to avoid performance degradation. The central scheduler also balances Raft leaders across nodes.

Cellar’s intelligent migration follows a three‑state model (idle, snapshot, transfer) and adjusts migration speed based on the target node’s pressure metrics. After migration, the source node proxies client requests to the new node and informs the client to refresh its routing table.

The system separates fast and slow requests into four dedicated queues (read‑fast, read‑slow, write‑fast, write‑slow) based on request characteristics, reducing tail latency and cutting 99.9th‑percentile delay by 86%.

Hotspot‑key handling adds a hotspot region manager to the central node, replicating hot keys to dedicated hotspot nodes and informing clients to cache hotspot locations, thereby isolating hot traffic and preserving overall performance.

Future Plans and Industry Trends – Meituan aims to optimize the Redis Gossip protocol for TB‑scale clusters, extend Raft replication to metadata services, unify SDKs for Squirrel and Cellar, and explore kernel‑bypass I/O (DPDK, SPDK), RDMA‑enabled NICs, 3D XPoint storage, and FPGA‑accelerated data processing to further improve latency and throughput.

The article concludes with author information: Qi Zebo, Senior Technical Expert at Meituan‑Dianping.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasehigh availabilitySquirrelKV storageMeituanCellar
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.