Operations 11 min read

Design and Practice of a Full-Link Load Testing Platform

This article describes the motivation, core design, technical choices, data and traffic isolation mechanisms, and implementation steps of a self‑developed full‑link load testing platform that enables production‑environment testing, reduces machine costs, and improves system stability and performance monitoring.

Beijing SF i-TECH City Technology Team

May 30, 2024

Design and Practice of a Full-Link Load Testing Platform

As business scale grows, ensuring system stability becomes critical; traditional load testing suffers from high cost, inability to test live interfaces, and lack of historical data comparison. To address these issues, a self‑developed full‑link load testing platform was built to test production interfaces directly, save resources, monitor node health, and quickly identify weak points.

What is full‑link load testing – it simulates massive user requests on real production scenarios, covering traffic recording, replay, and pressure application, offering advantages such as real‑world request scenarios, significant machine cost savings, comprehensive link monitoring, and rapid problem discovery.

Technical selection – two main approaches were evaluated: traffic marking and machine marking. Traffic marking isolates data at DB, cache, and MQ layers using shadow resources, while machine marking deploys separate machines and resources. Traffic marking was chosen for its maturity (used by Meituan and Alibaba), lower cost, and easier integration with existing middleware.

Platform core design

1. Overall architecture – includes a control center (brain) for task creation, configuration, and reporting, and a pressure engine (duckpear‑engine) comprising kafka‑replay, goreplay, and vegeta.

2. Components – vegeta (customized Go‑based load generator), goreplay (HTTP traffic recorder/replayer), and kafka‑replay (Kafka write performance tester). Each component was extended to support rate control, parameter construction, result assertions, and Prometheus monitoring.

Data isolation – achieved via shadow databases, shadow Redis keys, and shadow Kafka topics, ensuring test traffic does not affect production data.

Traffic isolation – integrated into the PIE framework with traffic identification, circuit breaking, mock services, and routing to shadow resources based on request headers.

Platform core functions

• Kafka replay – configurable thread count, rate, and offset or time‑based replay with visual reporting.

• Traffic recording and replay – HTTP flow capture to COS, configurable machines, duration, filters, and replay speed.

• Interface testing – supports serial and parallel execution, thread and QPS control, parameter templating, and report generation.

Pressure engine scaling – distributed architecture allows adding machines to SFNS, publishing services via a unified platform, and mapping engines to test targets for horizontal scaling.

Full‑link testing implementation – divided into pre‑test (data preparation, configuration checks, risk assessment), during‑test (monitoring node metrics and aborting on anomalies), and post‑test (result analysis, bottleneck identification, and goal verification).

In summary, the platform has been deployed in several business lines, reducing testing barriers and resource consumption, though challenges remain in large‑scale data preparation; future work will focus on streamlining data setup and further enhancing the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system stability performance testing load testing Distributed Testing Data Isolation full-link testing traffic marking

Written by

Beijing SF i-TECH City Technology Team

Official tech channel of Beijing SF i-TECH City. A publishing platform for technology innovation, practical implementation, and frontier tech exploration.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.