Operations 14 min read

Practical Experience in Microservice Governance at Ctrip: Challenges, Strategies, and Results

This article shares Ctrip's practical experience in microservice governance, detailing the background, common pitfalls such as excessive service granularity and cyclic dependencies, and presenting concrete goals, principles, and strategies that led to significant improvements in stability, performance, and development efficiency.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Practical Experience in Microservice Governance at Ctrip: Challenges, Strategies, and Results

Author : Hong Liang, senior technical expert at Ctrip, focuses on system performance, stability, capacity, and transaction quality, with extensive experience in architecture evolution and high concurrency.

Background : As microservice architectures proliferate in large internet companies, the number of services and their call relationships become increasingly complex, exposing optimization opportunities in stability, throughput, and resource utilization.

Purpose : From a development perspective, the article discusses problems encountered in microservice architectures, using a ticket activity reservation query engine as a case study, and shares practical governance experiences.

Microservice History : The concept was introduced in 2005 and became an architectural style in 2011. Microservices are a style where a large application is composed of multiple services, each focusing on a single task. They are an implementation of SOA without an ESB and differ from traditional SOA in several ways.

Microservice Pitfalls :

Excessive service granularity leads to complex call graphs and performance loss.

Repeated calls cause redundant processing.

Cyclic dependencies can trigger cascading failures.

Long call chains increase latency and make troubleshooting difficult.

Governance Goals : Stability – isolate failures and improve system reliability. Delivery – enable independent iteration, scaling, and rapid release. Reuse – share common functionality across services.

Governance Principles : Avoid cross‑team code maintenance. Match service granularity to team size (≤3 services per developer). Enforce layered architecture with no reverse dependencies. Limit vertical depth to five layers within a domain.

Governance Strategies : Eliminate cyclic dependencies by defining clear layers and locating services. Shorten call chains through domain‑driven decomposition and reducing pass‑through data. Consolidate duplicate functionality into shared base services (e.g., unified caching, translation). Traffic management: Merge repeated calls into a single aggregated call. Combine small interfaces to reduce call volume. Isolate core and non‑core traffic (e.g., job scheduling vs. user requests). Flatten offline job scheduling peaks by extending scheduling windows and client‑side rate limiting. Improve resource utilization by reducing the number of applications per developer and merging low‑CPU‑usage services.

Implementation Effects : Removed 65 cyclic‑dependency links, reducing timeout alerts by 99% and cutting troubleshooting time to minutes. Reduced call‑chain depth by 40%. Added a unified data service, cutting cache capacity by 60%. Core service call volume decreased 73% and peak load dropped 50%. Development efficiency improved through horizontal splitting and vertical layer reduction (≤3 layers). Query engine QPS increased from 80k to 240k (65% performance gain). Average applications per developer limited to ≤2. CPU utilization of 40+ services rose from 18% to 32%.

Conclusion : In microservice architectures, finer service granularity leads to more complex dependencies, higher latency, and lower developer efficiency; therefore, services should be sized appropriately based on business volume, team size, and cost, following the outlined goals, principles, and strategies to achieve stable, high‑performance systems.

Recommended Reading : Islands Architecture (孤岛架构) in Ctrip's new homepage 8× deployment efficiency: Ctrip ticket activity direct platform practice Ctrip monitoring system Dashboard storage upgrade: 600M rows per minute Ctrip overseas MySQL data replication practice

performance optimizationmicroservicesSystem Stabilitytraffic managementservice governance
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.