Operations 16 min read

Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System

This article reviews Ctrip's two‑year architecture transformation, describing how the company replaced hardware load balancers with a software‑defined SLB, introduced application‑level grouping, multi‑update mechanisms, health‑check sharing, monitoring, and the TARS release platform to achieve faster, more reliable deployments.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip's Fourth‑Generation Architecture: Elastic Routing (SLB) and the TARS Release System

Ctrip, the largest OTA in China, embarked on a two‑year architecture overhaul involving all business lines. The legacy single‑machine multi‑application deployment caused tight coupling, high operational complexity, and reliance on hardware load balancers (LB) that were expensive to scale, hard to model, and required manual interventions.

To achieve fine‑grained, application‑level operations, Ctrip built a software load balancer (SLB) based on Nginx. The core idea is to model each application as a Group , where all SLB operations are abstracted to Group actions, ensuring isolation between Groups.

Key design goals for SLB include high concurrency, real‑time rule updates, and flexible, fine‑grained routing. To avoid the cost of updating Nginx configuration thousands of times, SLB batches Group updates within a short window, merges them, and applies a single Nginx reload, achieving "multiple updates, one effective" behavior.

Multi‑role conflicts are resolved by assigning each role a distinct server state; only the role that set a state can clear it, and a server is considered healthy only when all roles deem it valid.

SLB also implements shared health‑check services, reducing bandwidth and CPU load compared with per‑node checks, and processes access logs in real time to generate multi‑dimensional monitoring data.

Building on SLB, Ctrip created the TARS release system, which supports gray releases, simple configuration, and rapid deployment. TARS defines a release unit called a "group" (aligned with SLB groups) and combines rolling and canary strategies to enable safe, incremental rollouts.

Key features of TARS include:

Configurable maximum concurrent pull‑out ratio, batch wait times, and timeout settings.

State‑machine‑driven UI that limits operators to two actionable buttons, reducing human error.

Fast rollback via local version retention and CDN‑like storage for release packages.

The combined CMS + SLB + TARS system yielded measurable improvements: weekly release cycles grew fourfold, average release time dropped from 13 minutes to 3 minutes, and release‑related incidents fell by more than 50%.

Future benefits include automated capacity management, cross‑IDC disaster recovery cloning, and self‑service migration of technology stacks (e.g., .NET to Java) using SLB‑driven traffic shifting.

operationsLoad BalancinginfrastructureSLBTARSRelease ManagementCtrip
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.