Operations 12 min read

Didi's Full‑Chain Load Testing Architecture and Implementation

The article details Didi's end‑to‑end load‑testing strategy—including online environment testing, data isolation with virtual orders, trace‑based traffic marking, and a distributed virtual driver/passenger tool—describing its design, deployment stages, findings, and future reliability applications.

Architecture Digest
Architecture Digest
Architecture Digest
Didi's Full‑Chain Load Testing Architecture and Implementation

Didi Chuxing, founded in 2012, has become a leading one‑stop ride‑hailing platform in China, scaling daily orders from millions to tens of millions and facing increasingly complex IT challenges as both traffic volume and engineering staff grew.

By 2016, the rapid surge to ten‑million‑plus daily orders caused frequent online incidents, prompting Didi to launch a full‑chain load‑testing project to ensure system stability.

Load‑Testing Plan

A typical Didi ride‑hailing flow—order creation, driver dispatch within minutes, pickup, and drop‑off—requires real‑time processing and close proximity between driver and passenger, making performance testing critical.

The chosen approach conducts load tests in the production environment using data isolation: virtual drivers and passengers generate traffic that is kept separate from real users, preventing interference with live services.

Testing in the online environment provides realistic conditions without configuration drift, but safety measures such as low‑traffic windows, robust monitoring, and immediate abort capabilities are mandatory to avoid disrupting live operations.

The core business chain covers multiple services (taxi, premium, car‑pooling, etc.), illustrated by the end‑to‑end process from passenger app input to driver dispatch, ride completion, and order cancellation.

Data Isolation

Isolation is essential; mixing virtual and real orders can corrupt driver scores, passenger balances, BI reports, and capacity forecasts. The basic virtual‑order scheme tags orders with special identifiers, but this incurs heavy code changes across many modules.

To reduce intrusion, Didi introduced layered virtualisation: first, city‑level virtual passengers and drivers; then, virtual cities; and finally, a fully virtual nation where coordinates are shifted to a separate “Pacific” space, allowing complete isolation of traffic.

Traffic Marking Scheme

Two options were considered for marking test traffic: (1) each service uses a business ID or flag, or (2) extend the internal Trace system to carry a test‑traffic marker. Didi adopted option 2, decoupling marking from business logic and promoting broader Trace adoption.

Tool‑Side Solution

The testing tool comprises distributed virtual driver and passenger clients that simulate large numbers of users. These clients communicate with the backend via HTTP, TCP long‑connections, and Thrift, maintaining a persistent TCP channel for driver dispatch messages.

Each virtual client fetches user profiles, routes, and initial positions from a data center to avoid duplicate logins during scaling.

Dynamic Business Model

The virtual clients use a configurable business model that can adjust scenario weights (e.g., local vs. inter‑city rides) without code changes, enabling rapid testing of different traffic mixes.

During staged deployment, Didi observed that random placement of virtual drivers leads to low match rates; instead, concentrating initial drivers and passengers in a hotspot (e.g., Beijing’s Dongdan area) yields a proportional increase in successful orders.

Load‑Test Record

In the first half of 2016, before the Didi‑Uber merger, intense business growth led to frequent incidents. The full‑chain load test was executed during low‑traffic windows (early morning), gradually increasing pressure while monitoring system health.

Results uncovered issues such as API latency spikes, misconfigured long‑connection server parameters, Codis timeouts in the dispatch service, and excessive logging causing dispatch timeouts.

Additional benefits included convergence of language‑specific component libraries, expanded Trace coverage, and the creation of an isolated production‑like environment for future correctness verification.

Looking forward, Didi plans to leverage full‑chain load testing for fault injection, gray‑release validation, and capacity forecasting across more services.

system reliabilityload testingData IsolationDiditrace systemdistributed simulation
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.