How Uber Scales Its Real‑Time Dispatch System: Backend Architecture Insights
This article examines Uber's rapidly growing real‑time dispatch platform, detailing its geo‑spatial indexing, microservice architecture, fault‑tolerant design, and scaling techniques that enable millions of writes per second and high availability across thousands of nodes.
Overview
Uber connects passengers and drivers through a real‑time market platform.
Challenge: build a dynamic supply‑demand system that works for both drivers and riders.
The dispatch system matches riders with drivers using mobile devices.
New Year’s Eve is Uber’s busiest day.
Rapid technological progress has turned once‑future tech (phone, network, GPS) into everyday tools.
Architecture Overview
Drivers and riders run native mobile apps that drive the system.
The backend processes information between mobile devices.
Clients communicate with the backend over mobile data and open networks.
The dispatch system is almost entirely written in Node.js.
Initially considered moving to io.js, but the projects merged.
Node.js enables interesting distributed‑system work in JavaScript.
Developer enthusiasm accelerates task completion.
Old Dispatch System
Designed for single‑passenger rides; assumptions limited expansion to UberPool, freight, and grocery delivery.
City‑level sharding worked early on but became hard to manage as more cities joined.
Rapid construction caused fault propagation across components.
New Dispatch System
Introduces separate supply and demand services to support multiple product types.
Supply service tracks quantity and state of all resources (vehicles, seats, wheelchair access, etc.).
Demand service tracks requests, orders, and special requirements (shared rides, cargo).
The DISCO service performs matching of supply and demand, predicting future availability and using geo‑spatial indexes for both supply and demand.
Scheduling Flow
Vehicles send location updates to the "geo by supply" index.
DISCO queries this index to find nearby candidate drivers.
Candidates are sent to the routing/ETA service, which computes road‑based distances.
ETA results are returned to the supply service and then to drivers.
Special handling is required for airport queues and multi‑stop trips.
Geo‑Spatial Index
Designed for high scalability: millions of writes per second, with read throughput several times higher.
Drivers send location updates every 4 seconds.
Old index tracked only dispatchable supply; new index tracks all supply states and planned routes.
Uses Google’s S2 geometry library to partition the earth into hierarchical cells (12 levels, 3.31 km² to 6.38 km²).
Cell IDs serve as sharding keys; DISCO queries cells around a driver’s location to retrieve nearby supply.
Scalability is achieved by adding nodes and replicas for write and read load.
Routing Goals
Minimize extra driving distance.
Reduce driver wait time.
Minimize total ETA.
Unlike the old system that only considered currently available supply, the new approach predicts future availability and prefers drivers already carrying passengers over idle drivers far away.
Extended Scheduling
Built with Node.js; requires stateful services, so traditional stateless scaling does not apply.
Node’s single‑process model is extended across multiple CPUs and machines using ringpop, a gossip‑based consistent‑hash ring.
Ringpop provides AP semantics (availability over consistency) and embeds a scalable, fault‑tolerant sharding layer.
Ringpop integrates with Uber’s RPC framework TChannel, which offers high‑performance request/response, pipeline support, and tracing.
Scheduling Availability
High availability is critical; failures must be retryable and idempotent.
Services are designed to be kill‑able without breaking the system; small, isolated components limit blast radius.
Ringpop and TChannel enable service discovery, routing, and fault isolation.
Whole‑Data‑Center Failure
Uber maintains a backup data center for rapid failover.
Driver phones act as a source of trip state when primary data is unavailable.
Periodic encrypted state snapshots are sent to driver devices; upon a data‑center outage, a driver’s next location update allows the dispatch system to reconstruct missing state seamlessly.
Original article: How Uber Scales Their Real‑Time Market Platform Translation by: Feng Yahua
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
