Ctrip Architecture Refactoring: From Croller to TARS – A Deployment and Configuration Management Case Study
This article reviews Ctrip's two‑year architecture transformation, describing the limitations of the legacy Croller release system, the design of a new group‑based configuration model, the introduction of seven‑layer load balancing and the TARS deployment platform, and detailing the implementation of a unified configuration management system (CMS) to improve operational efficiency and reliability.
As China’s largest OTA, Ctrip faced rapid business growth and increasingly agile requirements, leading to over 5,000 applications and more than 3,000 weekly release requests by the end of 2014. The existing Croller release system, based on a train‑style deployment model, became a bottleneck, causing delays and failures when applications shared pools and carriages.
The team identified three core problems: weak isolation of ASP.NET applications on shared IIS pools, domain‑level load balancing that prevented application‑level health checks, and inconsistent governance data that hindered monitoring and troubleshooting.
To address these issues, they introduced a "Group" concept and designed a comprehensive data model (App, Server, Pool, Group, Route) managed by a centralized CMS, ensuring consistent and accurate configuration data. They also deployed a seven‑layer load balancer (SLB) to isolate access points per application and implemented a next‑generation release system called TARS, which supports application‑level deployments.
The solution’s implementation consists of three main parts:
(1) Introducing the Group concept with a full data structure (App, Server, Pool, Group, Route) provided via CMS APIs.
(2) Adding seven‑layer load balancing to achieve application‑level isolation and health monitoring.
(3) Designing and building TARS to replace Croller and enable granular, reliable releases.
The unified configuration system (CMS) was built to provide accurate, compliant, and traceable data, supporting fast queries, relationship management, and change history tracking. Key features include:
Accurate and compliant data with continuous governance via a rule engine.
Efficient relationship queries across organizations, products, applications, clusters, servers, and domains.
Change‑tracking that propagates through object relationships.
High‑availability and scalability to handle tens of thousands of queries per minute.
App
Represents an application (web, service, job, etc.)
Server
Represents a server
Pool
A set of servers deploying the same application
Group
A collection of instances of the same application
Through CMS, all tools in Ctrip’s ecosystem can retrieve consistent configuration information, enabling coordinated deployment, monitoring, and resource management across heterogeneous technologies.
The article concludes that a well‑designed configuration management platform, combined with isolated load balancing and a modern release system, significantly improves deployment efficiency, reliability, and operational visibility for large‑scale internet services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
