Design and Implementation of Ctrip's Fourth-Generation Full-Link Performance Testing System
This article outlines the evolution of Ctrip’s performance testing approaches across three generations, analyzes their limitations, and presents the design, architecture, data construction, request tracing, monitoring, and operational considerations of the fourth-generation full‑link testing platform, including case studies and future outlook.
Background and Significance
Ctrip’s application performance testing and capacity assessment have progressed through three generations: the first generation used simple single‑interface tests with tools like ab and JMeter; the second generation introduced single‑application production environment pressure testing by increasing the weight of a machine in the cluster; the third generation employed production traffic replay, copying traffic to test machines and supporting scaling and filtering.
Each generation faced limitations such as difficulty simulating complex user input, inability to reflect upstream/downstream dependencies, and challenges with high‑concurrency scenarios, leading to the fundamental question of testing in production versus test environments.
System Design
The fourth‑generation full‑link testing system adopts a layered architecture consisting of data construction, load‑testing logic, control, engine, application, and middleware layers, targeting large‑scale promotional events.
Data Construction Layer
Supports three methods—manual construction, log embedding, and production traffic replay—allowing parameterized, diversified virtual user data to avoid hotspots.
Full‑Link Request Identification
Uses HTTP headers to mark root, parent, and current nodes, enabling the construction of a complete request topology across services.
Load‑Testing Data Cleanup
Identifies and removes dirty data generated during testing (BI, UBT, risk control, and application‑specific data) to prevent contamination of downstream analytics and recommendation algorithms.
Full‑Link Testing Platform
Provides functions for managing test scripts (stored in Git), test hosts, test tasks, and generating test reports, with capabilities for real‑time monitoring, automatic or manual task termination, and data versioning.
Monitoring Design
Monitors multiple dimensions: machine (CPU, memory, network, GC), application (request volume, error rate, latency), capacity limits, order metrics, full‑link call chains, and includes fault‑tolerant circuit‑break mechanisms.
Key Challenges
Key work includes pre‑identifying core business call chains, constructing realistic test data, and isolating dirty data. Difficulties involve tagging and recognizing test traffic across services and coordinating cross‑departmental efforts to avoid production impact.
Case Sharing
Describes preparation of target applications (link analysis, code modifications, external mock), data preparation, link abstraction, test execution steps (scenario creation, pressure application, monitoring), and examples of smoke detection when response times increase and TPS drops.
Summary and Outlook
The platform is now in production, having undergone multiple test cycles and continuous improvements. Future work will focus on reducing usage costs and enhancing monitoring integration to provide a more convenient and powerful full‑link performance testing product.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.