Backend Development 11 min read

Didi Ride‑Sharing Dispatch Engine: Architecture, Challenges, and Stability Measures for Carpool Day

During Didi’s 2019 Carpool Day promotion, a surge of up to 6.6‑times normal matching traffic forced a redesign of its dispatch engine, introducing near‑time assignment, filtered logic moves, configurable timeouts, extensive stress testing, monitoring, and rapid on‑call procedures that cut downstream pressure by over half.

Didi Tech
Didi Tech
Didi Tech
Didi Ride‑Sharing Dispatch Engine: Architecture, Challenges, and Stability Measures for Carpool Day

Background – On November 29, 2019, Didi celebrated the 4th anniversary of its car‑pool service by launching a new product and announcing a "One‑Cent Carpool Day" promotion in 26 Chinese cities. The promotion caused a massive surge in order‑matching traffic, prompting a deep dive into the dispatch engine.

Dispatch Architecture Overview

The dispatch engine matches drivers with orders. It supports two main modes: single‑ride dispatch (fast‑car, premium, taxi, etc.) and car‑pool dispatch, where multiple orders are bundled into a single “package” and matched to a driver.

In the matching funnel, the number of driver‑order pairs grows quadratically (e.g., 1,000 orders × 4,000 drivers → 4 million pairs), directly tying system pressure to the product of order and driver counts.

Challenges Introduced by Carpool Day

Carpool orders have a longer lifecycle (up to 1 day) compared with single‑ride orders that cancel after a few minutes, increasing the time window for matching.

The "carpool‑native" attributes require additional route‑planning and passenger‑to‑passenger compatibility calculations.

Service‑oriented architecture adds filtering layers that can waste downstream resources when many driver‑order pairs are filtered out early.

During the promotion, driver‑order matching peaked at 2.24× the normal peak, while carpool‑specific matching rose 6.6× .

Stability Assurance Measures

Architecture Optimization – Revised the "near‑time assignment" for the carpool reservation mode so that orders only enter the matching pool shortly before departure, reducing pool size.

Filtering Logic Optimization – Moved high‑cost carpool filtering rules into the generic filtering stage, achieving a trade‑off between architecture complexity and performance.

These optimizations reduced downstream access pressure by more than 50% under the same call volume.

Timeout‑Retry Configuration – Made the main dispatch loop, downstream timeout, and retry count configurable with second‑level rollout, and set retries to 1 during the event to avoid cascading failures.

Pre‑plan Construction – Defined five categories of contingency plans (early degradation, entry‑point, core‑link, business‑metric anomalies, crash recovery) and rehearsed them to ensure rapid response.

Full‑Link Stress Testing – Conducted 8 rehearsal runs and 9 official stress tests, uncovering 45 critical issues and refining capacity estimates.

Monitoring & Alerting – Established comprehensive alerts (machine health, service health, dependency health, client latency) with escalation policies, ensuring no error is missed on the event day.

Centralized On‑Call & Rapid Decision – Implemented a flat on‑call organization, real‑time incident reporting every 10 minutes, and predefined decision‑making procedures for various failure scenarios.

Outlook – Carpool matching poses greater challenges in complexity and path planning than single‑ride services. The experience from Carpool Day will guide continuous improvements to provide a more stable and reliable dispatch system.

monitoringSystem ArchitecturescalabilityCapacity Planningcarpooldispatch engine
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.