Databases 19 min read

Evolution of Ximalaya KV Storage and XCache Architecture

Ximalaya’s KV storage progressed from a simple Redis master‑slave setup to client‑side sharding, then adopted Codis clustering for elastic scaling, integrated Pika’s disk‑based store with cold‑hot separation, introduced KV‑blob separation, fast‑slow command pools, second‑level expansion, ehash fields, large‑key circuit breaking, multi‑active data‑center replication, and now targets cloud‑native deployment, advanced features, and AI‑driven operations.

Ximalaya Technology Team

Jul 13, 2023

Evolution of Ximalaya KV Storage and XCache Architecture

Ximalaya KV Storage Development History

KV storage is a core component of Ximalaya, handling billions of requests daily. This document introduces the evolution of Ximalaya's KV storage, covering its history, self‑developed + community cache system, and future plans.

Redis Master‑Slave Mode

Initially, Ximalaya used a simple Redis master‑slave architecture with VIP failover. While easy to deploy, it limited QPS (<10w) and data size (<10GB) due to single‑node constraints.

Sharding Mode

To overcome single‑node limits, a client‑side sharding scheme was introduced, hashing keys to different Redis nodes. This improved QPS but lacked elastic scaling and required client code changes when adding or removing nodes.

Cluster Mode Selection (2016)

Three popular solutions were evaluated:

Redis Cluster : official, decentralized, easy deployment, but tightly coupled and hard to upgrade.

Twemproxy : Twitter’s proxy, but not friendly for smooth scaling.

Codis Redis : open‑source proxy, compatible with Twemproxy, better performance, supports smooth scaling and visual management.

Codis Redis was chosen as the cluster solution.

Codis Redis Architecture

Codis uses a proxy layer that hashes keys (CRC32) to route commands to backend Redis shards. ZooKeeper provides service discovery; new proxies register automatically, and Jodis clients listen for changes. Codis also includes a web UI (Codis‑fe) and Sentinel management.

Elastic Scaling in Codis

Codis divides the keyspace into 1024 slots. When expanding from 2 to 4 shards, slots are migrated gradually (e.g., slots 256‑511 move from group‑1 to group‑3) and the proxy’s slot map is updated, achieving smooth scaling.

Limitations of Codis Redis

Issues include memory consumption (all data in RAM), high operational cost for many instances, long restart times, and costly master‑slave failover.

Codis Pika

Pika (360’s open‑source Redis‑compatible KV store) stores data on disk using RocksDB, breaking the memory limit of Redis. It supports most Redis commands but suffers from disk‑read latency spikes and high IO during compaction.

Cold‑Hot Data Separation

Because Pika stores everything on disk, a layer of Redis is added to cache hot data in memory while keeping cold data on disk, dramatically reducing response latency.

KV Separation Storage

RocksDB’s LSM‑tree causes write amplification. By separating large values into a dedicated blob file and keeping only keys and indexes in SST files, compaction writes are reduced, SSD wear is lowered, and overall storage size shrinks.

Fast‑Slow Command Separation

A dual thread‑pool model isolates fast commands (e.g., GET/SET) from slow commands (e.g., HGETALL). Each pool runs independently, preventing slow operations from blocking fast ones.

Second‑Level (秒级) Cluster Expansion

When expanding from 2 to 4 shards, slave instances are reassigned to new groups and the proxy’s slot routing is updated without moving any data, achieving near‑instant scaling.

EHash Data Type

To allow field‑level expiration in hash structures, an “ehash” type adds a timestamp to each field. Expired fields are removed during reads and the deletion is logged to the binlog for replication.

Large‑Key Detection and Circuit Breaking

The proxy monitors request sizes (e.g., string values > monitor_max_value_len, list length > monitor_max_batchsize, range > 1000, HGETALL) and classifies them as large‑key requests. Configurable circuit‑break strategies can blacklist offending keys, block specific commands, or reject a configurable percentage of requests.

Multi‑Active Data Center (同城多活)

To avoid single‑datacenter bottlenecks and improve fault tolerance, a dual‑read architecture is deployed across two data centers. In a failure, read traffic switches to the backup site while writes remain in the primary site. Future work aims for dual‑write active‑active mode.

Future Development Plans

Key directions for XCache include:

Feature enhancements such as Lua scripting, transactions, strong consistency, multi‑tenant support, bulk loading, multi‑get optimization, hot‑key detection, and static data analysis.

Cloud‑native transformation to automate deployment, scaling, and management of clusters.

Intelligent operations leveraging data‑driven SRE practices, machine learning, and AI (e.g., ChatGPT) to automate decision‑making and task orchestration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability Redis distributed cache KV storage Codis Pika XCache

Written by

Ximalaya Technology Team

Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.