Operations 17 min read

Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

This article details ZTO's evolution from traditional offline and online load‑testing approaches—highlighting their shortcomings—to a comprehensive full‑chain performance testing solution that uses JavaAgent probes, shadow resources, and a structured deployment and verification process to ensure safe, accurate production testing.

Zhongtong Tech
Zhongtong Tech
Zhongtong Tech
Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

Background

ZTO's performance testing historically used both online and offline load‑testing methods, each with distinct drawbacks that complicated testing workflows.

Issues with Offline Load Testing

Offline testing relied on proportional scaling of CPU and memory (e.g., 10:1 ratio) to mimic production environments. This approach distorted results for high‑TPS scenarios because network, middleware, and database factors could not be scaled proportionally.

Issues with Online Load Testing

Online testing focused mainly on read interfaces, requiring developers to modify code to prevent test data from contaminating production data. This manual hard‑coding increased cost, risk, and resistance to online testing, leading many teams to prefer offline testing despite its inaccuracies.

Introducing Technical Solution

ZTO adopted an online full‑chain load‑testing product that injects a JavaAgent probe via bytecode manipulation, eliminating the need for code changes. The agent identifies test traffic, routes it to shadow resources, and isolates test data from production data, achieving two core functions: traffic coloring and data safety.

Figure: Load‑testing traffic flow diagram.

Full‑Chain Load Testing Deployment & Core Config

Agent installation steps include uploading pradar-agent.zip to the target server, extracting it, and modifying the application startup script to add the following JVM argument before -jar:

-javaagent:/home/admin/pradar-agent/agent/pradar-core-ext-bootstrap-1.0.0.jar -Dpradar.project.name=APP_NAME

After restarting the application, verify successful reporting in the Pradar web console.

Shadow Resource Configuration

For Redis, the agent prefixes keys with PT_. For MQ, shadow topics and consumer groups are created with the same PT_ prefix via the ZMS configuration center. Shadow databases and tables are configured by adding PT_ to the original names, as shown in the diagrams.

Mock (Shield) Configuration

Sensitive operations such as payment deductions or SMS sending are protected by mock implementations. When the probe detects test traffic, it executes the mock code instead of the real business logic, preventing unintended side effects.

Link Integration & Testing Process

The full‑chain testing process consists of three stages: (1) Requirement definition and link mapping, (2) Test environment deployment and execution, and (3) Online testing and result generation.

Requirement Definition & Link Mapping

Key activities include defining business scope, performance goals (TPS, RT, success rate, SA), and collecting detailed information about applications, databases, caches, middleware, and potential sensitive impacts. This information guides shadow resource setup and mock configuration.

Testing Environment Debugging

Test traffic is marked with User-Agent:PerfomanceTest for HTTP or p-pradar-cluster-test:true for Dubbo. After confirming isolation in the test environment, the team proceeds to online testing.

Online Testing & Result Generation

Preparation includes creating shadow databases, topics, and configuring the Pradar web console. A rollout plan is documented, followed by gray‑scale verification, full rollout, and finally the online test execution using JMeter scripts with ramp‑up (step‑up) mode.

Result Analysis

Test reports display performance metrics and leak detection results. Leak detection monitors whether test data inadvertently writes to production tables, using binlog listeners to trigger alerts.

Thoughts on Full‑Chain Testing Practice

Since adopting the probe‑based approach, ZTO has successfully supported 62 applications across major sales events, solving many previous issues while introducing new challenges such as increased coordination effort.

Organization & Work Mode Issues

Two organizational models are compared: a company‑level project with top‑down drive versus a department‑level project led by the performance testing team, each requiring cross‑functional collaboration.

Other Issues

Low automation in probe version control and load‑generator provisioning.

Manual, offline process for configuration review and approval.

Limited reuse of test scripts and data.

Absence of automated baseline comparison and visual analytics for test results.

Conclusion

ZTO's full‑chain load testing has eliminated scaling‑induced inaccuracies, but managing shadow resources across dozens of applications remains complex and resource‑intensive, requiring ongoing process improvements and performance‑focused innovation initiatives.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

operationsPerformance TestingLoad Testingfull-chain testingshadow resources
Zhongtong Tech
Written by

Zhongtong Tech

Integrating industry and information for digital efficiency, advancing Zhongtong Express's high-quality development through digitalization. This is the public channel of Zhongtong's tech team, delivering internal tech insights, product news, job openings, and event updates. Stay tuned!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.