From Single Server to Cloud‑Native: How Taobao Scaled to Millions of Concurrent Users
This article walks through Taobao's architectural evolution—from a single‑server setup to distributed clusters, caching, load balancing, microservices, containerization, and finally cloud platforms—illustrating the technologies and design principles needed to handle hundred‑to‑hundred‑million concurrent requests.
Overview
The article uses Taobao as a case study to illustrate how a service‑side architecture evolves from handling a few hundred concurrent users to tens of millions, highlighting the technical challenges and solutions at each stage.
Basic Concepts
Distributed : Deploying multiple modules on different servers, e.g., Tomcat and databases on separate machines.
High Availability : Remaining operational when some nodes fail.
Cluster : A group of servers providing a unified service, with automatic failover.
Load Balancing : Evenly distributing incoming requests across multiple nodes.
Forward and Reverse Proxy : Forward proxy lets internal systems access external networks; reverse proxy forwards external requests to internal servers.
Architecture Evolution
1) Single‑Machine Architecture
Initially, Tomcat and the database run on the same server. Users access www.taobao.com which resolves to a single IP and reaches that Tomcat instance.
2) First Evolution – Separate Tomcat and Database
Tomcat and the database are deployed on separate servers, eliminating resource contention and improving performance.
3) Second Evolution – Add Local and Distributed Caches
Local cache (e.g., memcached) is added inside Tomcat, and a distributed cache (Redis) is introduced to store hot product data and HTML pages, dramatically reducing database load.
4) Third Evolution – Reverse Proxy for Load Balancing
Multiple Tomcat instances are deployed and Nginx (or HAProxy) distributes requests across them. Assuming each Tomcat handles 100 concurrent connections and Nginx 50,000, the system can theoretically support 50,000 concurrent users.
5) Fourth Evolution – Database Read/Write Separation
Writes go to a primary database, while reads are served by multiple replicas. Mycat is used as middleware to manage read/write splitting and sharding.
6) Fifth Evolution – Business‑Level Database Sharding
Different business domains store data in separate databases, reducing contention and allowing independent scaling.
7) Sixth Evolution – Split Large Tables
Large tables are partitioned (e.g., by product ID hash or hourly tables) and accessed via Mycat, enabling horizontal scaling of the database layer.
8) Seventh Evolution – LVS/F5 for Multi‑Level Load Balancing
LVS (software) or F5 (hardware) balances traffic across multiple Nginx instances. Keepalived provides virtual IP failover for high availability.
9) Eighth Evolution – DNS Round‑Robin Across Data Centers
DNS maps a domain to multiple IPs, each pointing to a different data‑center, achieving inter‑data‑center load balancing.
10) Ninth Evolution – NoSQL and Search Engines
When relational databases become a bottleneck for large‑scale analytics, technologies such as HDFS, HBase, MongoDB, ElasticSearch, Kylin, and Druid are introduced for storage, key‑value access, full‑text search, and multidimensional analysis.
11) Tenth Evolution – Split Monolith into Small Applications
Code is divided by business domain, allowing independent deployment and scaling. Shared configuration can be managed via Zookeeper.
12) Eleventh Evolution – Extract Reusable Functions as Microservices
Common functionalities (user management, order, payment, authentication) are isolated into independent services accessed via HTTP, TCP, or RPC. Frameworks such as Dubbo or Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.
13) Twelfth Evolution – Enterprise Service Bus (ESB)
ESB unifies protocol conversion and service invocation, reducing coupling. This architecture resembles SOA and overlaps with microservices concepts.
14) Thirteenth Evolution – Containerization
Docker packages applications into images; Kubernetes orchestrates them, enabling dynamic scaling and isolated runtime environments.
15) Fourteenth Evolution – Cloud Platform
The system is deployed on public cloud (IaaS/PaaS/SaaS). Resources are provisioned on demand, combined with Docker and Kubernetes for rapid scaling during traffic spikes and released afterward, achieving cost‑effective elasticity.
Architecture Design Summary
The evolution path is not mandatory; real‑world systems may address multiple bottlenecks simultaneously or follow a different order based on business needs.
For a one‑off system with clear performance targets, design enough to meet those targets while leaving hooks for future scaling. For continuously evolving platforms like e‑commerce sites, design for the next growth stage and iterate.
Service‑side architecture focuses on application organization, while big‑data architecture provides the underlying storage, processing, and analytics capabilities.
Design Principles
N+1 design – eliminate single points of failure.
Rollback design – ensure forward compatibility and ability to revert versions.
Feature toggle – configurable enable/disable of functions for rapid fault isolation.
Monitoring – embed observability from the start.
Active‑active data centers – achieve high availability across locations.
Use mature technologies – avoid untested or unsupported components.
Resource isolation – prevent one business from monopolizing resources.
Horizontal scalability – design for scale‑out to avoid bottlenecks.
Buy non‑core solutions – leverage commercial products for peripheral functions.
Commercial hardware – improve reliability.
Rapid iteration – develop small features quickly for early feedback.
Stateless services – keep service interfaces independent of prior requests.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
