Design and Implementation of a Production Traffic Replay System for Functional and Performance Testing
The article describes a production traffic replay system that records real user traffic, creates scalable pressure sources, supports both 4‑layer and 7‑layer protocols, and provides automated fail‑over and monitoring features to enable realistic functional and performance testing at large scale.
Background
In product iteration, functional and performance testing are essential; capacity estimation relies on performance testing, and test case coverage is a key metric for functional testing. Manual construction of massive, production‑like test cases is difficult, prompting the need for a system that can generate large pressure sources using real production traffic.
Solution
The traffic replay system mirrors real production traffic, records it, and can replay it with adjustable pressure (e.g., 10×). It supports both Layer‑4 and Layer‑7 protocols, works across platforms, and automatically balances traffic to avoid overloading production servers.
System Architecture
The system consists of three modules: Drain Task Settings , Replay Task Settings , and Task Query . A dedicated collector node is added to the production cluster; it forwards normal traffic to the original servers while simultaneously capturing a copy. The collector can listen to several times the traffic of a single server.
The captured traffic can be saved as offline pcap files or as raw request text. During replay, the traffic can be sent to test servers with different versions or with a configurable amplification factor. Weight adjustments (e.g., collector:Server1:Server2 = 5:6:5) ensure the collector receives enough data without overloading the original servers.
A self‑protection mechanism monitors backend health; if a target server fails three consecutive checks, the drain task is automatically paused to prevent traffic loss.
Projects
Project 1 demonstrates internal service traffic copying, where the collector forwards traffic back to the original cluster and stores two mirrored copies (pcap and raw request files). Project 2 shows traffic replay, converting recorded offline traffic into requests and replaying them on external test servers for multi‑version comparison.
Summary and Outlook
The current system efficiently captures massive, unmodified production traffic, solves the problem of manually crafted test data, and supports cross‑platform deployment, 7‑layer protocol customization, and HTTP header manipulation. Future work aims to reduce cluster intrusion by co‑locating the collector on the replay target, achieving seamless traffic mirroring for direct‑connect scenarios.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.