How Uber Scales Its Real-Time Ride‑Sharing Platform: Architecture Secrets
This article examines Uber's rapid 38‑fold growth by detailing the design, scaling techniques, and fault‑tolerance mechanisms of its real‑time market platform, including geographic indexing, microservices, distributed storage, and the DISCO scheduling system.
Statistics
Uber's geographic index aims for millions of writes per second and several times that rate for reads.
The dispatch system runs on thousands of nodes.
Platform
Node.js
Python
Java
Go
Native iOS and Android apps
Microservices
Re‑sharding system
Postgres
MySQL
Riak
Twitter Twemproxy
Google S2 geometry library
Ringpop – consistent‑hash ring
TChannel – RPC multiplexing and framing
Thrift
Overview
Uber connects passengers and drivers as a real‑time market.
Challenge: build a dynamic supply‑demand system that works instantly for both sides.
The dispatch system matches riders and drivers via mobile devices.
New Year’s Eve is Uber’s busiest day.
Rapid technological progress has turned once‑fictional tools like GPS into everyday utilities.
Architecture Overview
Drivers and riders using native mobile apps drive the system.
The backend processes information between mobile devices.
Clients connect to the dispatch system to match supply and demand.
The dispatch system is written almost entirely in Node.js.
* Uber originally considered moving to io.js before the projects merged.
* JavaScript can be used for interesting distributed‑system work.
* The enthusiasm of developers enables rapid task completion.
Old Dispatch System
Limitations of the old system began to restrict growth.
Most of the system needed rewriting.
It was designed for single‑passenger rides, assuming one passenger per car and only mobile users, which hindered expansion to UberPool, food, and package delivery.
City‑level sharding worked initially but became hard to manage as more cities joined.
Failures in one component could cascade to others.
New Dispatch System
To solve city sharding and support new product types, separate supply and demand services were created.
Supply service tracks quantity and state of all supplies (vehicles, seats, wheelchair access, etc.).
Demand service tracks all ride requests and their requirements.
DISCO service performs scheduling optimization, matching supply and demand, predicting future availability, and using geographic indexes for both supply and demand.
Scheduling Flow
Vehicles send location updates every few seconds to the "geo‑by‑supply" index.
DISCO queries this index to find nearby candidate vehicles.
Candidates are sent to the routing/ETA service, which computes road‑distance based estimates.
ETA results are returned to the supply service and then to drivers.
Special handling is required for airport queues and multi‑passenger scenarios.
Geographic Index
Designed for high scalability: millions of writes per second, with read throughput several times higher.
Uses Google’s S2 library to partition the earth into hierarchical cells with unique IDs.
Both supply and demand entities are indexed by these cell IDs, enabling fast proximity queries.
Scaling is achieved by adding more nodes and replicas for write and read load.
Routing Goals
Minimize extra driving, reduce rider wait time, and lower overall ETA.
Prefer drivers already carrying passengers over idle drivers far away.
Predictive models improve decisions for shared rides, package delivery, and food transport.
Scaling the Dispatch Service
Built with Node.js; requires stateful services, so traditional stateless scaling does not apply.
Node’s single‑process model is extended across multiple CPUs and machines using Ringpop, a gossip‑based consistent‑hash ring.
Ringpop provides AP semantics (availability over strict consistency) and integrates with Uber’s RPC layer, TChannel.
TChannel, inspired by Twitter’s Finagle, offers high‑performance, multiplexed RPC, outperforming HTTP by ~20×.
Persistent connections are used for gossip and data forwarding.
Dispatch Availability and Fault Tolerance
All operations are designed to be retryable and idempotent.
Services are partitioned into small, isolated components to limit failure impact.
Backup data centers and driver‑phone‑based state snapshots enable rapid failover.
Cross‑region replication and request shadowing mitigate latency spikes.
Translation by Feng Yahua; Proofreading by Wendy. Original article: How Uber Scales Their Real‑Time Market Platform
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
