Operations 11 min read

Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights

This article explores Ctrip’s continuously evolving architecture, detailing its three-layer composition of operations, frameworks, and applications, and examines real-world case studies of its release system, configuration management, SOA, and a massive User Profile big‑data project, highlighting key innovations and lessons learned.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights

Ctrip's architecture has undergone long‑term evolution and iteration, with many products experiencing more than five major updates. Each iteration addressed the shortcomings of its predecessor while inevitably introducing new challenges.

Architecture Composition

The overall architecture consists of three parts: Operations, Framework, and Applications.

01 Operations

Operations form the robust backbone of Ctrip's high‑availability and stability, featuring four major highlights:

Cluster Management Strategy : Web clusters use SLB to control traffic, automatically pulling servers in or out based on health‑check results. Deployment and scaling are transparent to developers.

FullDR Mechanism : Web, DB, and Redis clusters implement a long‑term FullDR mechanism that automatically recovers from an entire IDC failure, with regular drills to assess impact on orders.

DBA Strategy : Data safety is paramount; Ctrip combines M‑S and FullDR to ensure high availability, migrating from MSSQL to MySQL transparently for users, and integrates multiple storage technologies (MSSQL, MySQL, Redis, Hive, ES) for consistency.

NOC Mechanism : A 24/7 NOC monitors all applications via order dashboards and anomaly charts, alerting responsible developers whenever issues arise.

02 Framework

The framework is the foundation of applications and has evolved through several stages:

SOA & Gateway : A long‑standing service governance platform.

Release System : Integrates features such as brake, rollback, version switching, shared DLL packaging, and POM checks. It survived a severe production failure and was rebuilt across four generations (ITSM, CITSM, CRoller/ROP, Tars/CD).

Message Queue : Ctrip developed its own queue, combining strengths of open‑source tools (Storm, MSMQ, ActiveMQ, RabbitMQ) with features like ordered partitions, asynchronous compensation, and lifecycle tracking.

Configuration Management : Evolved to provide convenient, high‑performance, and high‑availability configuration handling across the organization.

03 Applications

Common techniques used across applications include PreLoading & LayerLoading, Sharding, circuit breaking, rate limiting, and degradation, which together greatly improve website and app stability.

Architecture Evolution

Release System

The release system progressed through four eras:

ITSM – a C/S tool that isolated developers from deployment.

CITSM – a B/S version enabling collaborative releases and version control.

CRoller (ROP) – added All‑In‑One, Config Gen, auto‑loading, but became overly powerful and caused a major outage.

Tars (CD) – introduced remote backup, eliminating the single‑point failure of local backups.

Configuration Management

Four generations of configuration systems:

Simple web.config wrapper with a web UI.

Integrated config changes into releases, eliminating site restarts.

Service‑based config loading with a listener for updates (binary on/off).

JSON‑supported, optimized listening, open‑sourced.

SOA

SOA evolved through several stages:

First generation introduced a governance platform and ESB bus for address resolution.

Second generation removed the ESB bottleneck, enabling direct service calls after fetching URLs from the governance platform.

Third generation added features such as circuit breaking, rate limiting, dynamic routing, extensive monitoring, and tracing.

With the rise of H5 and mobile, the Gateway replaced Mobile Service, adding anti‑scraping and authentication.

User Profile Project

The "User Profile" project is a core big‑data component comprising six functions: registration, collection, computation, storage, query, and monitoring.

Data sources include personal information, travel history, contacts, user behavior, and order details. Collection uses both batch and streaming pipelines; the streaming path relies on Kafka, Storm, and Ctrip’s proprietary Hermes platform.

All collected data is stored in a User Profile repository exceeding 100 billion records, backed by Hive, MySQL, and Redis with FullDR + M‑S designs.

Despite this massive scale, average service response time remains around 10 ms (including ~4 ms network latency), achieved through circuit breaking, rate limiting, degradation, and sharding mechanisms that ensure high availability.

Source: http://www.uml.org.cn/zjjs/201711031.asp

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureBig DataDeploymentSOACtrip
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.