Operations 13 min read

Youzan Full‑Link Load Testing Architecture and Implementation

Youzan’s full‑link load‑testing architecture combines a traffic generator, a data‑factory pipeline, and the Maxim platform to replay realistic e‑commerce user actions, tag and isolate test traffic via unified headers, route reads/writes to shadow storage, and integrate Gatling for capacity planning, degradation, alarm, disaster‑recovery and throttling drills.

Youzan Coder
Youzan Coder
Youzan Coder
Youzan Full‑Link Load Testing Architecture and Implementation

Youzan strives to become the most trusted service provider for merchants. To guarantee system stability during massive promotional events, it has built a full‑link load‑testing framework that simulates real traffic, validates system stability under peak load, supports capacity planning, and enables degradation, alarm, disaster‑recovery, and throttling drills.

The overall design consists of a traffic generator that creates massive user requests, a data factory that builds the request payloads and prepares shadow data, and a load‑testing platform that distributes scripts and data to agent machines. The agents replay user actions (browse, add to cart, order, pay) while the production services recognize the test traffic and route reads/writes to shadow storage to avoid contaminating real data.

Traffic Identification is achieved by adding a unified request header (e.g., Header Name: X-Service-Chain; Header Value: {"zan_test": true} ) for HTTP calls, or by using Dubbo attachments for RPC. Middleware such as NSQ and Wagon also propagate a test flag (e.g., Key: zan_test; Value: true ) so that every downstream component can detect and isolate test traffic. Asynchronous threads can be customized to carry the flag as well.

Data Isolation is realized through three mechanisms:

Proxy isolation – the RDS‑Proxy transparently forwards the test flag to the database, which then swaps real tables with shadow tables.

Client SDK isolation – the client library automatically rewrites target tables to their shadow counterparts.

Data offset isolation – IDs used in test data are offset to guarantee no overlap with production IDs, and separate jobs are deployed to scan shadow databases.

The load‑testing platform (named maxim ) manages scripts, data sets, jobs, injection machines, and report generation. It integrates with Gatling for performance measurement and provides a UI for configuring injection rates, concurrent users, target RPS, and other parameters.

Implementation Process :

Define test plan: identify core e‑commerce flow (home → product → order → payment), build a funnel model, and set traffic targets based on historical data.

Data factory: import production data, cleanse and desensitize it in Hive, then export to shadow databases.

Generate request data sets: a MapReduce job reads a wide table of parameters and writes JSON request files to HDFS. Example JSON generation code is shown below.

Prepare test scripts: use a unified RESTful API description, control funnel conversion rates, and compose multiple scenarios with Gatling’s setUp API.

Example Gatling script snippet:

setUp(
    scn0.inject(constantUsersPerSec(10) during (1 minute)).throttle(
        reachRps(300) in (30 seconds),
        holdFor(2 minute)).protocols(CustomHttpProtocol.httpProtocol),
    scn1.inject(constantUsersPerSec(10) during (1 minute)).throttle(
        reachRps(500) in (10 seconds),
        holdFor(3 minute)).protocols(CustomHttpProtocol.httpProtocol),
    scn2.inject(constantUsersPerSec(10) during (1 minute)).throttle(
        reachRps(200) in (20 seconds),
        holdFor(1 minute)).protocols(CustomHttpProtocol.httpProtocol)
)

During execution, the maxim platform orchestrates job deployment, monitors injection machines, and generates Gatling reports. Users can adjust injection machine count, repeat count, concurrent users, and target RPS directly from the UI.

In summary, Youzan’s full‑link load‑testing solution combines big‑data pipelines, traffic tagging, shadow data isolation, and a custom execution platform to safely validate system performance under realistic peak loads.

distributed systemsBig DataPerformance Testingload testingData IsolationGatling
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.