Databases 37 min read

Ctrip Order Database Architecture Optimization and Sharding Case Study

This article details Ctrip's comprehensive redesign of its airline ticket order database, covering the background performance bottlenecks, vertical and hot‑cold data splitting, sharding key selection, multi‑level caching, cross‑shard query optimization, dual‑write mechanisms, fault‑tolerance strategies, project planning, and the resulting improvements in scalability and cost.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip Order Database Architecture Optimization and Sharding Case Study

Background – Rapid growth of airline ticket orders caused CPU saturation, disk shortage, and limited scaling ability in the existing monolithic order database, prompting a need for architectural overhaul.

Initial Optimizations – Traditional techniques such as index tuning, read/write separation, and high‑frequency reduction were applied, yielding short‑term stability but exposing new bottlenecks as data volume grew.

Vertical Splitting – The order database was divided by business domain (order management, ticketing, refunds, etc.), isolating workloads, improving reliability, and reducing the impact of non‑core changes.

Hot‑Cold Data Separation – Completed orders are migrated to a read‑only cold database; a restore process moves data back to the hot database when modifications are required. This reduces load on the primary store while preserving query capability.

Sharding Design – From 2019 the team planned a sharding project. The chosen shard key is the primary order ID because it is immutable, covers 90% of queries, and avoids cross‑shard hotspots. Data is horizontally split into 64 shards across 16 physical machines.

Multi‑Level Caching – To accelerate shard‑key lookup, an order‑ID‑to‑order‑ID index table is cached at three levels: client local memory, Redis distributed cache, and server‑side Guava cache. Memory‑efficient structures (e.g., Map<Long, short[]> ) reduce per‑index overhead by ~93%.

Cross‑Shard Query Optimization – Index tables for user ID and other frequent non‑shard fields limit the number of shards queried. A mirror database aggregates hot data for complex queries, avoiding full‑shard scans.

Dual‑Write Component – The DAL‑Extension component supports asynchronous and synchronous dual‑write from SQLServer to MySQL, with configurable exception handling modes (AC, SC, ST). It also provides dual‑read capability for gray‑scale migrations.

Fault Handling – Strategies include returning partial results when some shards fail, shard isolation via a "continueOnError" flag, and dynamic re‑hashing to reroute traffic away from failed shards.

Project Planning & Execution – The project was divided into six phases: API read‑closure, dual‑read/write development, data consistency verification, performance testing, migration of reads to MySQL, and final cut‑over. Milestones, task boards, and regular reviews ensured risk mitigation.

Outcomes – After completion, the system achieved 64‑shard horizontal scaling, CPU utilization dropped from ~40% to 3‑5%, order processing capacity increased to support >5 years of data, and overall infrastructure cost was reduced by moving from 128‑core servers to 40‑core configurations.

Key Lessons – Clear project scope, early data access closure, minimizing exceptions, reducing dependencies, and incremental delivery were critical to success.

dual writeperformance optimizationfault tolerancedatabase shardinghorizontal scalingmulti‑level cachinghot‑cold data separation
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.