Backend Development 9 min read

XPipe: Multi‑Data‑Center Redis Replication, High Availability and Disaster‑Recovery Switching

This article introduces XPipe, a framework designed by Ctrip to enable Redis multi‑data‑center replication, ensure high availability through keeper and MetaServer components, and provide reliable disaster‑recovery (DR) switching while maintaining low latency and data consistency.

Ctrip Technology
Ctrip Technology
Ctrip Technology
XPipe: Multi‑Data‑Center Redis Replication, High Availability and Disaster‑Recovery Switching

Author : Meng Wenchao, Senior Manager of Framework R&D at Ctrip Technology Center, joined Ctrip in 2016 and leads the Redis multi‑data‑center project XPipe; previously led the communication team at Dianping.

Redis is extensively used within Ctrip, handling about 2 million QPS read/write operations, with roughly 100,000 QPS writes, and many services treat Redis as an in‑memory database.

To improve availability and performance, Ctrip requires multi‑data‑center Redis deployment, prompting the creation of XPipe.

XPipe addresses three core challenges: data replication with consistency, high availability of both XPipe and Redis, and disaster‑recovery (DR) switching when a data center fails.

For data replication, client‑side double‑write was examined but leads to inconsistency when writes succeed on one DC and fail on another; proxy servers act as a single client to avoid this, but they introduce complexity and potential single‑point failures. XPipe therefore adopts a pseudo‑slave "keeper" that pretends to be a Redis slave, allowing the master to push logs to the keeper, which buffers them on disk and can compress or encrypt traffic between data centers.

The keeper design enables reliable cross‑DC log transmission, supports custom protocols for compression and encryption, and mitigates data loss during network outages.

High availability is achieved by deploying each keeper as a master‑backup pair; a MetaServer monitors keeper status, promotes a backup to master on failure, and balances load across MetaServer nodes. Redis Sentinel is also used, but XPipe implements its own psync2.0 protocol on Redis 3.0.7 to avoid full‑sync pauses during master promotion.

DR switching follows a four‑step process similar to a two‑phase commit: (1) verify switch feasibility, (2) forbid writes on the old master, (3) promote the new master, and (4) synchronize other data centers to the new master. Rollback and retry mechanisms are provided for manual DBA intervention.

The overall architecture consists of a Console for meta‑information management, keepers for log buffering and cross‑DC transfer, and MetaServers for keeper coordination.

Testing shows that adding a keeper adds only ~0.1 ms latency (master‑to‑slave 0.2 ms → with keeper 0.3 ms). In production, two data centers with two keeper layers exhibit an average latency of 0.8 ms (99.9th percentile 2 ms), well within acceptable limits.

In summary, XPipe solves Redis multi‑data‑center data synchronization and DR switching, while the enhanced Redis version with psync2.0 greatly improves cluster stability.

All components are open‑source: XPipe ( https://github.com/ctripcorp/x-pipe ) and XRedis (enhanced Redis 3.0.7, https://github.com/ctripcorp/redis ).

backendhigh availabilityredisdisaster recoveryReplicationmulti-data centerXPipe
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.