How Taobao Scaled from 100 to Millions of Concurrent Users: A Step‑by‑Step Architecture Evolution
This article uses Taobao as a case study to illustrate how a web service evolves from a single‑machine setup to a cloud‑native, micro‑service architecture capable of handling tens of millions of concurrent requests, detailing each technical milestone and the principles behind the design choices.
1. Overview
This article uses Taobao as an example to describe the evolution of server‑side architecture from a hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles.
2. Basic Concepts
Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.
High Availability : The system continues to provide service when some nodes fail.
Cluster : A group of servers providing a unified service, with automatic failover.
Load Balancing : Distributing incoming requests evenly across multiple nodes.
Forward and Reverse Proxy : Forward proxy lets internal systems access external networks; reverse proxy forwards external requests to internal servers.
3. Architecture Evolution
3.1 Single‑Machine Architecture
Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource competition makes this setup insufficient.
3.2 First Evolution: Separate Tomcat and Database
Tomcat and the database each occupy dedicated servers, significantly improving performance, but database read/write becomes the bottleneck as traffic increases.
3.3 Second Evolution: Local and Distributed Caching
Introduce local cache (e.g., memcached) and distributed cache (e.g., Redis) to store hot items and HTML pages, reducing database load. Issues such as cache consistency, penetration, and avalanche are addressed.
3.4 Third Evolution: Reverse Proxy Load Balancing
Deploy multiple Tomcat instances behind a reverse proxy (Nginx or HAProxy). This raises the concurrent capacity dramatically, but the database becomes the new bottleneck.
3.5 Fourth Evolution: Database Read/Write Separation
Separate the database into read replicas and a single write master, using middleware such as Mycat to synchronize data and handle sharding.
3.6 Fifth Evolution: Business‑Based Database Sharding
Store different business data in separate databases to reduce contention; high‑traffic services can be allocated more servers.
3.7 Sixth Evolution: Splitting Large Tables
Hash‑based routing splits large tables (e.g., comments, payments) into many smaller tables, enabling horizontal scaling. This leads to a distributed database architecture often implemented with Mycat.
Open‑source MPP databases such as Greenplum, TiDB, PostgreSQL‑XC, and commercial ones like GBase provide SQL‑compatible distributed query execution.
3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing
When Nginx becomes a bottleneck, layer‑4 load balancers like LVS (software) or F5 (hardware) distribute traffic across many Nginx instances, with keepalived providing high availability.
3.9 Eighth Evolution: DNS Round‑Robin Across Data Centers
Configure DNS to return multiple IPs, each pointing to a different data‑center, achieving data‑center‑level load balancing and horizontal scaling to tens of millions of concurrent users.
3.10 Ninth Evolution: NoSQL and Search Engines
Introduce HDFS for file storage, HBase/Redis for key‑value data, Elasticsearch for full‑text search, and Kylin/Druid for multidimensional analysis to handle massive data and complex queries.
3.11 Tenth Evolution: Splitting Monolith into Small Applications
Divide the system by business domains, allowing independent deployment and scaling; shared configuration can be managed via Zookeeper.
3.12 Eleventh Evolution: Extracting Reusable Functions as Microservices
Common functionalities (user management, order, payment, authentication) become independent services accessed via HTTP, TCP, or RPC, using frameworks like Dubbo or Spring Cloud for governance.
3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access
ESB abstracts protocol differences, enabling applications and services to communicate uniformly, representing a SOA architecture that overlaps with microservices.
3.14 Thirteenth Evolution: Containerization
Docker packages applications into images; Kubernetes orchestrates dynamic deployment, enabling rapid scaling for peak events and isolation of runtime environments.
3.15 Fourteenth Evolution: Cloud Platform Adoption
Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing hardware costs and simplifying operations.
4. Architecture Design Summary
Architecture adjustments need not follow a fixed order; they should address the most pressing bottlenecks first.
Design depth depends on system goals: meet current performance targets while leaving room for future growth.
Service‑side architecture differs from big‑data architecture, which focuses on data ingestion, storage, and analysis.
Key design principles include N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑active data centers, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
