High Availability Architecture for SQL Server, MySQL, and Redis at Ctrip
This article explains Ctrip's high‑availability designs for SQL Server, MySQL, and Redis, describing multi‑replica strategies, evolution from legacy mirroring to AlwaysOn, multi‑node MHA management, and Redis sentinel‑based failover, while emphasizing fault‑tolerance across data centers.
To achieve high availability, Ctrip adopts a three‑replica model (primary, synchronous secondary, asynchronous disaster‑recovery replica) for databases, ensuring service continuity when a node fails and enabling rapid cross‑region failover.
SQL Server High Availability evolved from manual mirroring to SAN‑based automatic failover and finally to the AlwaysOn architecture (SQL Server 2012+). The primary replica handles read/write traffic, a synchronous secondary keeps data consistent, and an asynchronous replica resides in a remote data center. Automatic failover to the synchronous node occurs within a minute, while failover to the asynchronous node requires manual intervention due to potential data loss.
Monitoring of the AlwaysOn setup relies on Windows Server clustering; best practices include using an odd number of cluster nodes (>9), extending regroup timeouts, and placing the quorum file share in a third data center with ForceQuorum fallback.
MySQL High Availability uses traditional binlog‑based replication and has progressed through three stages: (1) MHA management with virtual IP failover, (2) direct IP connections with QConfig‑driven IP updates, and (3) a multi‑MHA architecture where five MHA managers across three data centers collaboratively detect failures, reach consensus, and update the configuration center to switch IPs.
Redis high availability follows a primary‑replica model with Sentinel monitoring. Five Sentinel instances watch each Redis node; when a master is objectively down, Sentinel elects a new master. Configuration metadata is stored in ConfigDB and served via Config Service, while a Keeper layer buffers cross‑region replication.
Overall, Ctrip's approach combines multiple replicas, automated failover mechanisms, and coordinated configuration services to ensure resilient database services across SQL Server, MySQL, and Redis deployments.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.