Backend Development 15 min read

Traffic Recording and Replay at Qunar: Architecture, Practice, and Impact on Interface Automation Testing and Full‑Link Stress Testing

This article describes Qunar's use of traffic recording and replay technology for fault isolation, performance optimization, and upgrade migration, detailing the evolution of their Java‑based agents, the end‑to‑end testing workflow, full‑link stress‑test integration, and the measurable business and performance outcomes.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Traffic Recording and Replay at Qunar: Architecture, Practice, and Impact on Interface Automation Testing and Full‑Link Stress Testing

Traffic recording captures network request and response packets for later analysis, while replay re‑injects the recorded packets to simulate real traffic, enabling fault diagnosis, performance verification, and upgrade validation.

Using Qunar as a case study, the article explains how the company applied recording‑replay in interface automation testing and full‑link stress testing, highlighting the challenges of testing write‑heavy scenarios such as order creation and payment.

The technical evolution progressed through three stages: (1) Areas – a custom JVM‑Sandbox based tool with high integration cost; (2) Q‑Thanos‑Agent – built on Alibaba's JVM‑Sandbox‑Repeater, offering lower development effort and plugin extensibility; (3) Cinema‑Agent – a fully in‑house agent optimized for high‑QPS workloads and full‑link pressure testing.

In interface automation testing, the workflow starts with configuring applications and recording settings in the test platform, selecting cases from recorded traffic, deploying them to a master and a branch environment, replaying the traffic via agents, and finally diffing responses to generate reports.

The replay implementation records method‑level events (Before, Return, Throw) to capture URI, request, and response data, enabling complete reconstruction of call chains across services, databases, and caches.

For full‑link stress testing, the system records live traffic, assigns unique identifiers, and uses agents to mock downstream calls based on recorded data, with shadow Redis and MySQL instances handling write operations.

Practical results include a reduction of case creation time from one person‑day to one hour, a decrease in monthly fault frequency from 0.28% to 0.18%, and acceptable performance overhead (CPU +1‑3%, memory +3‑5%).

The article concludes with a summary of the current usage of Q‑Thanos‑Agent and Cinema‑Agent, future plans to merge their functionalities, and reflections on the benefits and trade‑offs of traffic recording and replay in large‑scale online services.

Automated TestingJava agenttraffic recordingreplayQunarfull-link stress testing
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.