Uber's Evolution from Monolith to Service‑Oriented Architecture: Design, Scaling, and Data Processing
The article details Uber's transition from a monolithic codebase to a service‑oriented architecture, describing its goals, functional and non‑functional requirements, DISCO dispatch optimization, micro‑service communication, data pipelines, storage choices, and supporting components such as Kafka, Hadoop, Spark, and security mechanisms.
Uber originally operated a monolithic architecture serving only San Francisco (UberBlack). Rapid growth in features and domain complexity caused tight coupling, making continuous integration and deployment cumbersome, which led the engineering team to refactor the platform into a service‑oriented architecture.
Goals included achieving 99.99% reliability, splitting the codebase into core (strictly reviewed) and optional parts, and defining a clear core architecture with plugins, reactive programming chains, and unified platform components.
Solution involved adopting iOS architectural patterns (MVC → VIPER, creating Riblets) and redesigning the system to support new services such as UberPool, scheduled rides, and promotional vehicles.
Functional requirements covered passenger‑driver interactions: viewing nearby drivers, requesting rides, seeing ETA and price, real‑time driver tracking, booking, automatic matching, location tracking, post‑ride actions (rating, email, billing), and dynamic pricing with incentive algorithms.
Non‑functional requirements emphasized globalization, low latency, high availability, strong consistency, scalability, and resilience to data‑center failures.
DISCO – Dispatch Optimization matches supply and demand using the Google S2 library for spatial indexing, minimizing total service time and driver travel time. The dispatch system is built on Node.js with event‑driven asynchronous mechanisms, WebSocket communication, consistent‑hash rings, SWIM/Gossip for node discovery, and RPC between servers.
Request Service handles passenger ride requests, captures location, streams requests via WebSocket, tracks GPS, and forwards demands to the dispatch system.
Supply Service tracks driver locations every five seconds, routes data through a load balancer to a Kafka REST API, and replicates updates to databases and DISCO for real‑time availability.
Data and Storage include high‑frequency read/write workloads for driver positions and trip billing data, evolving from PostgreSQL to a schema‑less MySQL‑based NoSQL store.
System Architecture comprises multiple layers: a three‑tier load balancer (L3 IP, L4 DNS, L7 application), Web Application Firewall for security, Kafka for durable log streaming, WebSockets for persistent client‑server connections, Hadoop for archival and batch analytics, and Spark streaming clusters for event processing and driver‑behavior analysis.
Payment Subsystem is a MySQL‑backed service triggered by Kafka after trip completion, handling fare calculation, payment authorization, refunds, tips, promotions, and multiple payment methods via open APIs.
Additional Engines include a driver‑profile engine for classification, a fraud detection engine leveraging ride patterns, and monitoring dashboards using Kibana/Graphana on Elasticsearch.
The article concludes that big‑data solutions are essential for Uber’s continued evolution, highlighting the extensive visual diagrams (omitted here) that illustrate the architecture and data flows.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.