How Taobao Scaled to Millions of Concurrent Users: Architecture Evolution
This article walks through Taobao’s journey from a single‑server setup to a cloud‑native, micro‑service architecture capable of handling tens of millions of concurrent requests, explaining each scaling step, the technologies involved, and key design principles for high‑availability systems.
Overview
This article uses Taobao as a case study to illustrate the evolution of server‑side architecture from handling a few hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles.
Basic Concepts
Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.
High Availability : System continues to provide service when some nodes fail.
Cluster : A group of servers offering a unified service, with automatic failover.
Load Balancing : Evenly distributing requests across multiple nodes.
Forward and Reverse Proxy : Forward proxy handles outbound traffic from internal systems; reverse proxy forwards inbound traffic to internal servers.
Architecture Evolution
1. Single‑Machine Architecture
Initially, Tomcat and the database run on the same server; as user count grows, resource contention appears.
2. First Evolution: Separate Tomcat and Database
Deploy Tomcat and the database on separate servers, improving performance, but database read/write becomes the new bottleneck.
3. Second Evolution: Local and Distributed Caching
Introduce local cache (e.g., memcached) and distributed cache (Redis) to offload most read traffic from the database, addressing cache consistency, penetration, breakdown, and avalanche issues.
4. Third Evolution: Reverse Proxy Load Balancing
Deploy multiple Tomcat instances behind a reverse proxy (Nginx or HAProxy) to distribute traffic, increasing concurrency but pushing the bottleneck to the database.
5. Fourth Evolution: Database Read/Write Splitting
Separate read and write databases using middleware such as Mycat, synchronizing writes to multiple read replicas and handling data consistency.
6. Fifth Evolution: Business‑Based Database Sharding
Allocate different business data to separate databases, reducing contention; large tables are split into smaller ones, often using Mycat for routing.
7. Sixth Evolution: Table Partitioning
Split massive tables (e.g., comments, payment logs) by hash or time, enabling horizontal scaling; MPP databases such as TiDB, Greenplum, PostgreSQL‑XC, etc., provide the underlying engine.
8. Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing
Introduce Layer‑4 load balancers (LVS software or F5 hardware) to balance traffic across multiple Nginx clusters, adding keepalived for high availability.
9. Eighth Evolution: DNS Round‑Robin Across Data Centers
Configure DNS to return multiple IPs for a domain, directing users to different data centers for global load balancing.
10. Ninth Evolution: NoSQL and Search Engines
Adopt HDFS for file storage, HBase/Redis for key‑value, ElasticSearch for full‑text search, and analytical engines like Kylin or Druid to handle massive data and diverse query patterns.
11. Tenth Evolution: Split Large Application into Smaller Services
Divide the monolith by business domain, using Zookeeper for distributed configuration.
12. Eleventh Evolution: Extract Reusable Functions as Micro‑services
Isolate common functionalities (user management, order, payment, authentication) into independent services using Dubbo, Spring Cloud, etc., with service governance features.
13. Twelfth Evolution: Enterprise Service Bus (ESB)
Introduce an ESB to unify protocol conversion and reduce coupling, forming a Service‑Oriented Architecture (SOA) that overlaps with micro‑service concepts.
14. Thirteenth Evolution: Containerization
Adopt Docker for packaging services and Kubernetes for orchestration, enabling dynamic scaling and isolation of runtime environments.
15. Fourteenth Evolution: Cloud Platform Adoption
Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing operational cost and enabling on‑demand scaling during peak events.
Architecture Design Summary
Architecture evolution does not have to follow a strict linear path; teams should address the most pressing bottlenecks first. Design should meet current performance goals while leaving room for future expansion. Key principles include N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑data‑center active‑active setups, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless service design.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
