Databases 10 min read

Evolution of Ctrip's Database High‑Availability and Disaster‑Recovery Architecture (1999‑2018)

This article chronicles Ctrip's database high‑availability and disaster‑recovery evolution—from simple SQL Server mirroring in the early years, through SAN‑based clustering and AlwaysOn, to the adoption of MySQL, Redis, MHA, and a one‑click DR automation tool—highlighting architectural decisions, challenges, and operational lessons learned.

Ctrip Technology

Nov 8, 2018

Evolution of Ctrip's Database High‑Availability and Disaster‑Recovery Architecture (1999‑2018)

Author Bio: Gao Deguang, senior database manager at Ctrip Technology Assurance Center, responsible for database operations, high‑availability (HA) and disaster‑recovery (DR) for SQL Server, MySQL, and Redis.

Website stability is critical; prolonged outages cause revenue loss and customer churn. Database HA/DR is a key component of Ctrip's overall high‑availability strategy.

1.0 Era (1999‑2008) – The company primarily used SQL Server. Architecture was simple: database mirroring with multiple primary databases sharing a single secondary server. Failover required manual restart or manual switch to the mirror, offering low cost but slow recovery and limited true HA.

2.0 Era (2008‑2012) – Rapid business growth led to SAN shared storage, replication distribution for read/write separation, and Failover Cluster for HA. DR still relied on mirroring. The architecture introduced automatic failover (≈2 minutes) and a read‑only replica for BI and backup verification.

Complexities emerged: tangled replication chains and heavy dependence on SAN, prompting a shift to Microsoft AlwaysOn Availability Groups (introduced 2012) and SSDs to replace SAN.

3.0 Era (2012‑2014) – AlwaysOn became mature, supporting up to eight readable replicas with low latency, eliminating the need for separate read‑only databases and reducing backup load on the primary.

4.0 Era (2014‑2018) – Recognizing the closed‑source nature of AlwaysOn, Ctrip gradually introduced open‑source MySQL and Redis. MySQL HA/DR was built with MHA (Master High Availability), using domain/virtual IP failover and dynamic data‑source routing to mitigate split‑brain risks. Redis HA/DR leveraged the in‑house CRedis middleware and Sentinel, with multi‑group sharding, cross‑IDC replication via XPipe, and a one‑click DR automation tool covering single clusters, whole business lines, or entire IDC failures.

The DR tool automates metadata‑driven switch plans, generates work orders for forced or rehearsal switches, supports concurrent batch operations, and is itself HA‑aware, requiring only one IDC to be up.

Overall, Ctrip's database layer evolved from simple, manually managed mirroring to a sophisticated, automated, multi‑technology ecosystem that dramatically improved stability, availability, and operational efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Disaster Recovery MHA AlwaysOn SQL Server Database HA

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.