How Ctrip Scales Its Architecture: Ops, Release, and Big Data Insights
This article outlines Ctrip’s evolving architecture—covering its operational backbone, framework components, release system, configuration management, SOA evolution, and the massive UserProfile big‑data platform—offering practical insights from a senior developer on how the company achieves high availability and scalability.
Introduction: Ctrip’s architecture has undergone continuous evolution, with many products experiencing more than five major updates. Each iteration addressed previous pain points while introducing new challenges, offering valuable lessons for engineers.
1. Architecture Composition
The architecture consists of three parts: Operations, Framework, and Applications.
1.1 Operations
Operations provide the foundation for high availability and stability, featuring four key capabilities:
Cluster Management Strategy : Web clusters use SLB to control traffic, automatically adding or removing nodes based on health checks.
FullDR Mechanism : Web, DB, and Redis clusters have a full disaster‑recovery system that is regularly exercised.
DBA Strategy : Data safety is prioritized with M‑S mechanisms, FullDR, and migration from MSSQL to MySQL, ensuring transparent migration for users.
NOC Mechanism : A 24/7 NOC monitors orders and application health, alerting developers when anomalies occur.
1.2 Framework
The framework underpins applications and has evolved through several components:
SOA & Gateway : Service governance platform with a long history.
Release System : Integrated features such as brake, rollback, version switching, shared DLL packaging, and POM checks.
Message Queue : A custom solution combining strengths of open‑source queues, offering ordered partitions, asynchronous compensation, and lifecycle tracking.
Configuration Management : Evolved to provide convenience, efficiency, and high performance.
1.3 Applications
Typical large‑scale applications employ techniques such as PreLoading, LayerLoading, Sharding, Circuit Breaker, Rate Limiting, and Degradation to improve stability and user experience.
2. Architecture Evolution
2.1 Release System
The release system has passed through four generations:
ITSM
CITSM
CRoller (ROP)
Tars (CD)
Early versions separated development and release using a C/S tool (ITSM). Subsequent generations moved to B/S (CITSM), added configuration integration, version control, and rollback, then introduced All‑In‑One features. The third generation caused a major production incident and was replaced by Tars, which added remote backup for greater resilience.
2.2 Configuration Management
Configuration management also evolved through four stages:
First generation: simple web.config wrapper with a web UI.
Second generation: integrated config changes into releases, eliminating site restarts.
Third generation: service‑based config loading with a binary on/off switch.
Fourth generation: JSON support, improved listeners, and open‑sourced.
2.3 SOA
SOA at Ctrip progressed from a centralized ESB bus (first generation) to direct service connections (second generation), then added features like circuit breaking, rate limiting, and dynamic routing (third generation). The gateway later replaced MobileService, adding anti‑scraping and authentication.
3. UserProfile Project
3.1 Composition
UserProfile is a core big‑data component consisting of six functions: registration, collection, computation, storage, query, and monitoring. Data sources include personal info, travel history, contacts, behavior, and orders.
3.2 Architecture
Collected data flow through batch and streaming pipelines, converging into the UserProfile repository. Real‑time processing uses Kafka + Storm and Ctrip’s proprietary Hermes platform. The repository stores over 10 billion records across Hive, MySQL, and Redis, all protected by FullDR and M‑S designs.
Despite this scale, average service response time remains around 10 ms (including 4 ms network latency), thanks to circuit breaking, rate limiting, degradation, and sharding, ensuring high availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
