Operations 12 min read

How JD Daojia Built a Scalable Load‑Testing Platform to Reduce Test Time to 15 Minutes

Facing rising traffic, JD Daojia’s in‑house load‑testing platform was redesigned to automate script management, enable distributed JMeter execution, integrate real‑time monitoring, and support custom RPC protocols, dramatically lowering manual effort, cutting test cycles from an hour to fifteen minutes while ensuring system stability.

Dada Group Technology
Dada Group Technology
Dada Group Technology
How JD Daojia Built a Scalable Load‑Testing Platform to Reduce Test Time to 15 Minutes

Background

JD Daojia, the leading local instant‑retail platform of Dada Group, serves over 40 million active users across more than 1,400 counties. Rapid growth of its O2O services required a reliable way to evaluate system stability during major promotions, version releases, and capacity planning.

Problems with the Existing Load‑Testing Process

High testing cost – every step required manual intervention from test engineers.

Complex test scenarios – scripts had to be uploaded/downloaded for each run, and many load‑generator machines were involved, leading to errors.

Result aggregation was cumbersome – analysts had to manually collect and process JMeter logs.

Solution Overview

The team built a new “JD Daojia Load‑Testing Platform” with three main goals:

Simplify the load‑testing workflow so engineers can focus on business‑level performance analysis.

Reduce labor and resource costs while quickly constructing realistic business‑level test scenarios.

Provide a platform‑as‑a‑service approach that can be integrated into the CI/CD pipeline with minimal overhead.

Implemented Functional Modules

1. Test‑Case Management – Supports uploading JMX scripts, parameter files, and attachments. Cases are stored as JSON, versioned automatically, and can be edited directly in the platform.

2. Script File Management – Each case creates a dedicated directory containing the script, parameters, and environment configuration. The platform uses jorphan to convert JSON into JMX files and distributes them to all load‑generator nodes.

3. Load Execution – One‑click execution allows selection of thread groups, scenarios, duration, and target machines. Tests run in a distributed fashion and generate a report automatically after completion.

4. Test Report Management – Real‑time metrics (TP99, availability, TPM, etc.) are pulled from the JD UMP monitoring system and persisted for post‑run analysis. Additionally, JMeter JTL logs are parsed to produce detailed performance tables.

5. Distributed Resource Management – Dynamic scaling of load‑generator nodes enables rapid expansion of testing capacity.

System Design

The platform follows a classic B/S architecture, built on Spring MVC. The load‑generation engine is a customized JMeter instance, and the platform itself schedules and orchestrates a cluster of JMeter nodes, providing full‑link distributed testing capabilities.

System Architecture Diagram
System Architecture Diagram

Core Features

Distributed Resource Management – Instead of using JMeter’s native distributed mode, the platform controls multiple nodes via a custom JMQ‑based coordination layer, allowing fine‑grained status monitoring and dynamic scaling.

Multi‑Protocol Support

Generic RPC Call – Requires only the RPC client dependency; test engineers specify service name, method, and parameter types, avoiding the need to compile Java packages for each interface.

Advantages – No service‑side JAR required, lower scripting effort, easy serialization of interface metadata.

Drawbacks – Requires custom client implementation and local caching, demanding solid development skills.

Monitoring and Reporting

To prevent load tests from impacting production services, the platform integrates real‑time monitoring via the UMP system. Metrics such as TP90/TP99/TP999, request volume, and availability are collected during the test, persisted, and visualized in unified dashboards. JMeter’s native logs (jmeter.log and console output) are asynchronously streamed to the platform for immediate inspection.

Circuit‑Breaker Protection

Online load testing is guarded by several safety mechanisms:

Resource/permission management – dedicated accounts and time windows for online tests.

Manual termination – users can stop a test at any moment.

Traffic throttling – automatic stop when predefined traffic thresholds are reached.

Error rate limits – if error rate exceeds 0.3 % or failures surpass 2,000, the test aborts and all anomalies are persisted for root‑cause analysis.

Circuit Breaker Metrics
Circuit Breaker Metrics

Custom Load Scenarios

Many business services (e.g., order processing) use asynchronous JMQ flows, which cannot be accurately modeled with simple request‑response testing. The platform therefore provides an NIO‑based asynchronous testing mode and numerous optimizations for custom scenarios.

Isolation and Full‑Link Testing

Online tests are isolated from real user traffic by tagging requests with a special identifier, allowing downstream services to distinguish and handle test traffic separately. The platform also supports full‑link traffic replay, including capture, filtering, and replay of production traffic for end‑to‑end performance validation.

Full‑Link Testing Flow
Full‑Link Testing Flow

Conclusion and Future Work

Since its launch, the platform has reduced the average test execution time from one hour to fifteen minutes and now serves over 100 business lines. While manual effort for data preparation and result analysis remains, upcoming plans include a dedicated performance‑data warehouse, JVM‑level tracing, traffic splitting, and tighter integration with the company’s overall stability ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsmonitoringAutomationPerformance TestingJMeterLoad Testing
Dada Group Technology
Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.