Operations 8 min read

Design and Implementation of a Production Traffic Replay System for Functional and Performance Testing

The article describes a production traffic replay system that records real user traffic, creates scalable pressure sources, supports both 4‑layer and 7‑layer protocols, and provides automated fail‑over and monitoring features to enable realistic functional and performance testing at large scale.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Implementation of a Production Traffic Replay System for Functional and Performance Testing

Background

In product iteration, functional and performance testing are essential; capacity estimation relies on performance testing, and test case coverage is a key metric for functional testing. Manual construction of massive, production‑like test cases is difficult, prompting the need for a system that can generate large pressure sources using real production traffic.

Solution

The traffic replay system mirrors real production traffic, records it, and can replay it with adjustable pressure (e.g., 10×). It supports both Layer‑4 and Layer‑7 protocols, works across platforms, and automatically balances traffic to avoid overloading production servers.

System Architecture

The system consists of three modules: Drain Task Settings , Replay Task Settings , and Task Query . A dedicated collector node is added to the production cluster; it forwards normal traffic to the original servers while simultaneously capturing a copy. The collector can listen to several times the traffic of a single server.

The captured traffic can be saved as offline pcap files or as raw request text. During replay, the traffic can be sent to test servers with different versions or with a configurable amplification factor. Weight adjustments (e.g., collector:Server1:Server2 = 5:6:5) ensure the collector receives enough data without overloading the original servers.

A self‑protection mechanism monitors backend health; if a target server fails three consecutive checks, the drain task is automatically paused to prevent traffic loss.

Projects

Project 1 demonstrates internal service traffic copying, where the collector forwards traffic back to the original cluster and stores two mirrored copies (pcap and raw request files). Project 2 shows traffic replay, converting recorded offline traffic into requests and replaying them on external test servers for multi‑version comparison.

Summary and Outlook

The current system efficiently captures massive, unmodified production traffic, solves the problem of manually crafted test data, and supports cross‑platform deployment, 7‑layer protocol customization, and HTTP header manipulation. Future work aims to reduce cluster intrusion by co‑locating the collector on the replay target, achieving seamless traffic mirroring for direct‑connect scenarios.

monitoringsystem architectureoperationstraffic replayPerformance Testingload testing
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.