Separating Test Traffic Trigger and Result Verification for Didi Ride‑Hailing Backend
By separating test‑traffic triggering from result verification, Didi’s ride‑hailing backend uses live‑traffic inspection and replayed offline tests with bucketed validation rules to achieve near‑zero‑cost, full‑coverage QA, catching hundreds of bugs annually and dramatically improving service reliability for drivers and passengers.
Didi Ride‑Hailing aims to provide a high‑quality travel experience, which puts strong requirements on service availability. Faster bug detection reduces the impact on drivers and passengers. The ride‑hailing backend processes millions of transaction scenarios and involves hundreds of downstream services, presenting a significant quality‑assurance challenge.
Background and Challenges
The backend must ensure both core business availability and correct operation of massive user scenarios. Online, the sheer volume of scenarios makes fine‑grained monitoring difficult. Offline, arbitrary changes can generate thousands of new test cases, and missing a scenario increases the risk of bugs reaching production.
Traditional quality‑assurance methods such as automation, traffic replay, and business monitoring struggle to validate highly complex business logic efficiently.
Automation : Effective for high‑impact core scenarios but costly for fine‑grained coverage of numerous upstream/downstream interactions.
Traffic Replay : Low‑cost, high‑coverage but relies on offline environments and generates noisy diffs, making root‑cause identification hard.
Business Monitoring : Provides fast fault perception for core metrics but lacks flexibility for small‑scale, irregular scenarios.
To address these gaps, the backend adopts a “test‑traffic trigger & result verification separation” approach, delegating traffic generation to the service owners while the central team focuses on result validation.
Test‑Traffic Trigger & Result Verification Separation
This concept splits the workflow into two steps: (1) triggering test traffic and (2) verifying the results. It enables independent optimization of coverage and validation effort.
Online Traffic Inspection – Near‑Zero‑Cost Full‑Result Verification
Every real user request is treated as a test case; result verification on live traffic provides comprehensive health checks with almost no additional cost.
Coverage includes request/response parameters, downstream calls, storage, and configuration reads, improving fault detection capability.
The process is invisible to production.
The architecture consists of traffic recording, parsing, bucketing, validation, and anomaly alerting. Recorded traffic captures only the elements of interest (API inputs/outputs, downstream calls, config reads/writes, DB reads/writes). The fastdev traffic‑replay tool provides the raw byte stream, which is then pre‑processed, filtered, and transformed into a standardized format for rule‑engine consumption.
Bucket rules, defined by business logic, assign traffic to scenarios; validation rules check that the traffic conforms to expected behavior. Violations trigger alerts with detailed information for rapid triage.
Offline Test Acceptance – Traffic Trigger Deduplication & Unified Result Verification
Tool‑driven traffic generation reduces manual effort.
Deduplication ensures that different test sources produce a single unified verification result.
The offline practice mirrors the online workflow: recorded traffic is replayed in a controlled environment, bucketed, and validated against the same rule set, providing a closed‑loop test for service changes.
Current Benefits
Since early 2020, the approach covers most backend services; over 100 bugs are recalled annually, significantly improving production quality.
The method is also effective for gray‑release validation, providing timely fault detection during feature rollouts.
Future Plans
The team will continue to increase online inspection coverage, refine rule definitions, and apply the same verification logic to new services, further reducing manual testing effort and accelerating bug detection.
Conclusion
The separation of test‑traffic triggering and result verification enables scalable, low‑cost quality assurance for a complex ride‑hailing backend, balancing automated coverage with precise fault detection.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.