How Ctrip Evolved Its Architecture: Lessons from 5+ Iterations
This article chronicles Ctrip's multi‑year architectural evolution—covering operations, framework, application layers, publishing system, configuration management, SOA, and the large‑scale UserProfile project—highlighting the motivations, challenges, and solutions that shaped its high‑availability, high‑performance platform.
Overview of Ctrip Architecture
Ctrip’s architecture has undergone continuous evolution and iteration, with many products experiencing more than five major updates. Each iteration addressed the shortcomings of the previous version while inevitably introducing new challenges, making the journey worth studying for engineers.
The architecture consists of three main components: Operations, Framework, and Application.
1. Operations
Ctrip’s operations provide a robust foundation for high availability and stability, featuring four key highlights:
1.1 Cluster Management Strategy
Web clusters use SLB to control traffic; based on health‑check results, instances are automatically added to or removed from the cluster. The publishing and scaling processes are transparent to developers.
1.2 FullDR Mechanism
Web, DB, and Redis clusters all implement a long‑term FullDR mechanism that automatically takes over when an entire IDC fails, with regular drills to assess impact on orders.
1.3 DBA Strategy
Data safety is paramount. Ctrip combines an M‑S mechanism with FullDR to ensure high availability, and has migrated from MSSQL to MySQL despite the cost, keeping the migration transparent to users. The storage stack includes MSSQL, MySQL, Redis, Hive, and Elasticsearch, providing high availability and eventual consistency.
1.4 NOC Mechanism
The NOC monitors all applications 24/7, displaying order volume trends and alerting developers when anomalies occur.
2. Framework
The framework underpins the applications and has itself evolved through several generations.
2.1 SOA & Gateway
The SOA & Gateway platform serves as the service governance layer, with a long history that will be detailed later.
2.2 Publishing System
Ctrip’s publishing system integrates features such as braking, rollback, version switching, shared DLL packaging, and POM checks. It has survived a severe production incident and emerged stronger.
2.3 Message Queue
Building on open‑source tools (Storm, MSMQ, ActiveMQ, RabbitMQ), Ctrip created a custom message queue with ordered partitions, asynchronous compensation, and lifecycle tracking.
2.4 Configuration Management
Configuration management emphasizes convenience, efficiency, and performance, reflecting broader industry trends.
3. Application Layer
Common techniques across applications include PreLoading & LayerLoading, sharding, circuit breaking, rate limiting, and degradation, which significantly improve website and app stability.
4. Architectural Evolution
4.1 Publishing System Evolution
The publishing system has passed through four eras:
ITSM – a C/S tool that isolated development from publishing but required sequential releases.
CITSM – a B/S implementation enabling collaborative releases and version management.
CRoller (ROP) – introduced All‑In‑One configuration, automatic loading, and a powerful but overly permissive system that caused a major outage.
Tars (CD) – the current generation adds remote backup, ensuring resilience even if local disks are wiped.
4.2 Configuration Management Evolution
Four generations of configuration systems have been deployed:
First generation wrapped web.config with a simple web UI.
Second generation embedded configuration changes into the publishing process, eliminating site restarts.
Third generation fetched configuration from a service at startup and supported live updates (on/off only).
Fourth generation supports JSON, improved listening mechanisms, and is open‑source.
4.3 SOA Evolution
SOA progressed from an ESB‑based bus (first generation) that became a bottleneck, to direct service connections (second generation), and finally to a feature‑rich platform (third generation) offering circuit breaking, rate limiting, dynamic routing, and gateway integration for mobile and H5 services.
5. UserProfile Project
The UserProfile project exemplifies Ctrip’s architectural strengths in big‑data processing.
5.1 Composition
It comprises six core functions: registration, collection, computation, storage, query, and monitoring. Data sources include personal information, travel history, contacts, user behavior, and order data.
5.2 Architecture
Data is ingested via batch and streaming pipelines; the streaming path uses Kafka, Storm, and Ctrip’s proprietary Hermes platform. Processed data is stored in Hive, MySQL, and Redis, all protected by FullDR + M‑S designs.
Currently, the UserProfile repository holds over 10 billion records, with average service response times around 10 ms (including ~4 ms network latency). High availability is ensured through circuit breaking, rate limiting, degradation, and sharding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
