How Ctrip Migrated a Decade‑Old Hotel Order System from SQL Server to MySQL
Ctrip’s hotel order platform evolved from a single SQL Server instance to a multi‑IDC, sharded architecture, and this article details the middleware‑driven migration, unified caching, data‑chain redesign, consistency safeguards, and operational practices that enabled seamless, high‑performance order processing at millions of transactions per day.
Since the first hotel order was recorded in 1999, Ctrip’s order storage has progressed from a single SQL Server database to a multi‑IDC, sharded solution, but the legacy design eventually hit capacity, performance, and cost limits. To support daily tens of millions of orders with high availability, low latency, and controllable costs, the team built a unified middleware layer that abstracts data access, introduces a global hot‑cache, and enables a transparent migration from SQL Server to MySQL.
Architecture Overview
The new middleware sits between applications and the databases, providing a unified data‑chain service. It decouples applications from direct DB connections, allowing the underlying storage to shift from SQL Server to MySQL without code changes. A diagram (Fig 1.1) illustrates the layered architecture: application → middleware → cache → DB (SQL Server/MySQL).
Key Application Scenarios
Sub‑second order synchronization : New orders are written to the DB via middleware, which simultaneously builds a cache entry. After the DB commit, an order message is pushed to downstream systems, an Elasticsearch index is created for search, and the data is archived to the data warehouse (T+1).
Automatic order dispatch & workbench : After a guest places an order, the middleware quickly forwards order details to merchants for confirmation and to internal workbenches for manual handling.
Online query & analytics : High‑QPS online queries hit the hot cache, while long‑running analytics use a Hive data‑warehouse pipeline.
System Upgrade Practices
The migration required hot‑migration with zero‑downtime. Middleware abstracts the data source, enabling dual‑write (SQL Server + MySQL) during the transition. The team implemented fine‑grained traffic control at the CRUD level, supporting whole‑DB, shard‑level, table‑level, or even individual operation routing.
1. Distributed Order Cache
Before caching, each service connected directly to the DB, causing duplicated queries and a single‑point‑of‑failure risk. The new cache stores hot order data in Redis; on a cache miss, the DB is queried and the result is written back to Redis. To reduce Redis key count, a domain‑based hash structure stores all related tables for an order under a single key (order‑id → {domain: value}), compressing values with protostuff to save space and network traffic.
2. Lossless Migration Technique
Dual‑write is performed with SQL Server as the primary writer and MySQL as the secondary. A synchronization job continuously compares recent changes and repairs inconsistencies, writing corrections to Elasticsearch for audit. The middleware’s configuration system can route reads/writes per table, allowing gradual migration from dual‑write to MySQL‑only reads and writes.
3. Data‑Warehouse Integration
The team follows a five‑layer ODS‑DIM‑EDW‑CDM‑ADM model. During migration, the ODS layer’s source is switched from SQL Server to MySQL. Consistency checks involve online jobs that compare row hashes and offline scripts that generate per‑table comparison jobs, handling date‑precision differences and XML‑type incompatibilities.
Core Challenges & Solutions
Fine‑grained traffic monitoring : Every CRUD operation is logged to Elasticsearch and visualized in Kibana/Grafana. DBTrace shows which DB executed each SQL, ensuring each migration step only proceeds after the previous one is verified.
Dual‑write consistency : Inconsistencies arise from missed writes, network glitches, or GC pauses. The team uses a job that treats SQL Server as the source of truth, repairs MySQL, and records diffs in ES. A fail‑over mechanism repeatedly retries until consistency is achieved.
Cache consistency : The team adopted a "write DB then delete cache" strategy with delayed double‑delete for sensitive tables. Optimistic locks and CUD markers prevent stale reads; queries bypass the cache when a lock or marker is present.
Database feature differences : Auto‑increment keys are synchronized by writing the generated SQL Server ID back to MySQL. Date‑time fields are truncated to seconds for comparison. XML fields unsupported in MySQL 5.7 are compressed and stored as VARCHAR, with length checks before insertion.
Alerting & Reliability Practices
Real‑time alerts monitor data‑drift between the two databases, order volume per shard, and cache health. A unified alert platform (Sitemon) duplicates alert rules for both data sources, enabling gray‑scale validation. The team also conducts regular "Wandering Earth" disaster‑recovery drills, simulating Redis failures, DB outages, and network partitions, and has added automatic middleware degradation based on observed latency spikes.
Future Plans
Develop a manual cache control console for NOC operators to isolate faulty Redis shards.
Implement automatic middleware degradation when cache‑related metrics exceed thresholds.
Integrate the middleware with a Service Mesh to simplify cloud migration and improve observability.
Overall, the migration showcases a comprehensive approach to modernizing a legacy order system, balancing performance, cost, and reliability while ensuring data integrity throughout the transition.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
