Inside Uber’s Complex Tech Stack: How They Scale Services Worldwide
This article breaks down Uber’s hybrid‑cloud infrastructure, storage choices, logging pipeline, service discovery, development languages, deployment tools, and monitoring system, revealing how the company builds a highly available, low‑latency platform that powers its global ride‑hailing service.
Underlying Foundations
Uber runs on a hybrid‑cloud model using multiple cloud providers and data centers worldwide. If one data center fails, traffic is instantly shifted to another, and each city’s data is replicated to a remote site, ensuring continuous operation without a dedicated backup center.
Storage started with a single Postgres database, but growing demands led to higher availability and lower latency solutions. Uber now uses Schemaless (an internal MySQL‑based system) for long‑term storage, and Riak and Cassandra for high‑availability, low‑latency needs. Distributed storage and analytics rely on the Hadoop ecosystem. Caching is handled by Redis with Twemproxy, providing scalable cache clusters without sacrificing hit rates.
Logging
Logs are critical for troubleshooting and business analysis. They are fed into a Kafka cluster and consumed by Hadoop, file storage, real‑time processing services, etc. Log search and visualization are powered by the ELK stack (Elasticsearch, Logstash, Kibana).
Service Discovery and Routing
Uber adopts an SOA architecture. Service communication is managed with HAProxy and Uber’s open‑source Hyperbahn system, which simplifies discovery and routing for massive microservice environments. Older services use HAProxy to route HTTP/JSON requests, while newer services employ protocols such as SPDY, HTTP/2, and TChannel together with IDLs like Thrift and Protobuf to improve speed and reliability.
Development and Deployment
Primary languages are Python, Node.js, Go, and Java; early stages used Python and Node.js, later adding Java and Go for performance. Java benefits from a rich open‑source ecosystem (e.g., Hadoop), while Go offers efficiency and simplicity. System‑level components use C/C++ for maximum performance.
Tools such as Phabricator (code review, bug tracking, project management) and OpenGrok (code search) support development, while Sphinx generates documentation. Deployment integrates many open‑source tools: Packer (container image management), Vagrant (development environment), Boto (AWS API), Unison (file sync), Puppet (configuration management), and Jenkins (continuous integration).
Monitoring
Uber built a Go‑based metrics collection system that gathers data from servers, services, and code. Collected metrics are analyzed for trends and visualized with Grafana dashboards. An anomaly‑detection tool compares current values against historical models to flag out‑of‑range measurements.
Conclusion
Uber’s technology stack is highly complex, combining numerous open‑source projects, internally developed systems, and several open‑sourced components of its own.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
