From Single Server to Cloud‑Native: Taobao’s 14‑Step Architecture Evolution
This article uses Taobao as a case study to trace the evolution of its server‑side architecture from a single‑machine setup to a cloud‑native, micro‑service ecosystem, detailing each scaling milestone, the technologies involved, and the design principles that guide high‑availability, high‑concurrency systems.
Overview
The article uses Taobao as an example to illustrate the evolution of server‑side architecture from handling a hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing key architectural design principles.
Basic Concepts
Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.
High Availability : System continues to provide service when some nodes fail.
Cluster : A group of servers acting as a single service, with automatic failover.
Load Balancing : Evenly distributing requests across multiple nodes.
Forward and Reverse Proxy : Forward proxy accesses external networks on behalf of internal systems; reverse proxy forwards external requests to internal servers.
Architecture Evolution
3.1 Single‑Machine Architecture
Initially, Tomcat and the database run on the same server. Users access www.taobao.com, which resolves to an IP and reaches the Tomcat instance.
As user count grows, Tomcat and the database compete for resources, and a single machine cannot sustain the load.
3.2 First Evolution: Separate Tomcat and Database
Tomcat and the database are deployed on separate servers, significantly improving performance of each component.
Database read/write becomes the bottleneck as user numbers increase.
3.3 Second Evolution: Introduce Local and Distributed Caches
Local cache (e.g., memcached) is added within Tomcat, and a distributed cache (Redis) is deployed externally to store hot product data and HTML, reducing database load and addressing cache consistency, penetration, and avalanche issues.
3.4 Third Evolution: Reverse Proxy for Load Balancing
Multiple Tomcat instances are deployed behind a reverse‑proxy such as Nginx, which distributes requests evenly. Technologies include Nginx and HAProxy, with considerations for session sharing and file uploads.
Reverse proxy greatly increases application concurrency, but the database becomes the next bottleneck.
3.5 Fourth Evolution: Database Read‑Write Separation
Separate read and write databases; read replicas synchronize from the primary. Middleware like Mycat manages read/write splitting and sharding, handling data consistency.
Different business modules compete for database resources, affecting performance.
3.6 Fifth Evolution: Business‑Level Sharding
Data for different business domains is stored in separate databases, reducing resource contention. Large tables are split into smaller ones, enabling horizontal scaling.
Single‑machine write databases eventually hit performance limits.
3.7 Sixth Evolution: Split Large Tables
Large tables are partitioned (e.g., by hash or time) into many small tables, allowing parallel processing. MPP databases such as Greenplum, TiDB, and PostgreSQL‑XC provide distributed query execution.
Both Tomcat and database can scale horizontally, but Nginx eventually becomes the bottleneck.
3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing
LVS (software) or F5 (hardware) load balancers operate at layer 4, distributing traffic among multiple Nginx instances, achieving higher concurrency and high availability via keepalived.
When concurrency reaches hundreds of thousands, LVS itself becomes a bottleneck, and geographic latency becomes noticeable.
3.9 Eighth Evolution: DNS Round‑Robin Across Data Centers
DNS is configured with multiple IPs pointing to different data‑center virtual IPs, enabling load balancing at the data‑center level and supporting million‑plus concurrent users by adding more sites.
Database alone cannot satisfy increasingly rich analytical and search requirements.
3.10 Ninth Evolution: Introduce NoSQL and Search Engines
When relational databases cannot handle massive data or complex queries, solutions such as HDFS, HBase, Redis, Elasticsearch, Kylin, and Druid are adopted for storage, key‑value access, full‑text search, and multidimensional analysis.
Adding more components increases system complexity and operational overhead.
3.11 Tenth Evolution: Split Monolithic Application into Smaller Apps
Business modules are separated into independent applications, each with clear responsibilities, while shared configurations are managed via Zookeeper.
Shared modules duplicated across apps make coordinated upgrades difficult.
3.12 Eleventh Evolution: Extract Common Functions as Micro‑services
Common functionalities (user management, order, payment, authentication) are packaged as independent services accessed via HTTP, TCP, or RPC, using frameworks like Dubbo or Spring Cloud for governance.
Different service interfaces increase integration complexity and create tangled call chains.
3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access
ESB provides protocol conversion and decouples services, forming an SOA architecture that overlaps with micro‑service concepts.
Growing number of services and applications makes deployment and scaling increasingly challenging.
3.14 Thirteenth Evolution: Containerization
Docker packages applications/services as images; Kubernetes orchestrates dynamic deployment, scaling, and resource isolation, simplifying operations especially during traffic spikes.
Containers solve dynamic scaling but still require on‑premise hardware, leading to low utilization outside peak periods.
3.15 Fourteenth Evolution: Cloud Platform Adoption
Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, pay‑as‑you‑go, and reduce operational costs.
IaaS : Infrastructure as a Service – raw compute, storage, network.
PaaS : Platform as a Service – ready‑to‑use middleware and frameworks.
SaaS : Software as a Service – complete applications delivered on demand.
Architecture Design Summary
Design should be guided by principles such as N+1 redundancy, rollback capability, feature toggles, built‑in monitoring, multi‑active data‑center, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless service interfaces.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Talk
Rooted in the "Dao" of architecture, we provide pragmatic, implementation‑focused architecture content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
