From Single Server to Cloud‑Native: Taobao’s 14‑Step Architecture Evolution
This article traces Taobao's backend architecture evolution—from a single‑server setup to distributed clusters, caching, load balancing, database sharding, microservices, containerization, and finally cloud‑native deployment—highlighting the technologies and design principles that enable scaling from hundreds to millions of concurrent users.
1. Overview
This article uses Taobao as an example to illustrate the evolution of server‑side architecture from a hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing key design principles.
2. Basic Concepts
Distributed : Deploying multiple modules on different servers, e.g., Tomcat and database on separate machines.
High Availability : When some nodes fail, others take over to continue providing service.
Cluster : A set of servers offering a unified service, such as Zookeeper's master‑slave configuration.
Load Balancing : Evenly distributing requests across multiple nodes to balance load.
Forward and Reverse Proxy : Forward proxy forwards internal requests to external networks; reverse proxy forwards external requests to internal servers.
3. Architecture Evolution
3.1 Single‑Machine Architecture
Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource contention makes this setup insufficient.
With increasing users, Tomcat and the database compete for resources, and a single machine cannot support the business.
3.2 First Evolution: Separate Tomcat and Database
Deploy Tomcat and the database on separate servers, significantly improving their individual performance.
As user numbers increase, concurrent reads/writes to the database become a bottleneck.
3.3 Second Evolution: Introduce Local and Distributed Caching
Add local cache in Tomcat/JVM and a distributed cache (e.g., Redis) to store hot product data or HTML pages, intercepting most requests before they hit the database.
Technologies include memcached for local cache, Redis for distributed cache, and handling cache consistency, penetration, avalanche, and hot‑data expiration.
Cache handles most traffic, but as users grow, the remaining load falls on a single Tomcat, slowing responses.
3.4 Third Evolution: Reverse Proxy for Load Balancing
Deploy multiple Tomcat instances and use Nginx (or HAProxy) as a reverse proxy to distribute requests evenly.
Technologies: Nginx, HAProxy, session sharing, file upload/download.
Reverse proxy greatly increases the concurrent capacity of application servers, but the database eventually becomes the bottleneck.
3.5 Fourth Evolution: Database Read‑Write Separation
Separate the database into read and write instances; multiple read replicas synchronize from the primary write node.
Technology: Mycat middleware for read/write separation and sharding.
Different services compete for the same database, causing performance interference.
3.6 Fifth Evolution: Business‑Based Database Sharding
Store data for different business domains in separate databases, reducing resource contention.
As users grow, the single write database eventually hits a performance ceiling.
3.7 Sixth Evolution: Split Large Tables into Small Tables
Hash‑based routing (e.g., by product ID) or time‑based partitioning (e.g., hourly tables) distributes data across many small tables, enabling horizontal scaling.
This approach increases DBA complexity and leads to distributed‑database architectures such as MPP.
Popular MPP solutions include Greenplum, TiDB, PostgreSQL‑XC, HAWQ, GBase, SnowballDB, and Huawei LibrA.
Both database and Tomcat can scale horizontally, but eventually Nginx becomes the bottleneck.
3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing
Use LVS (software) or F5 (hardware) at layer 4 to balance traffic across multiple Nginx instances, supporting tens of thousands to hundreds of thousands of concurrent connections.
High availability is achieved with keepalived and virtual IPs.
When concurrency reaches hundreds of thousands, LVS itself becomes a bottleneck, and geographic latency differences emerge.
3.9 Eighth Evolution: DNS Round‑Robin for Inter‑Data‑Center Balancing
Configure DNS to return multiple IPs for a domain, each pointing to a different data‑center, achieving load balancing across regions.
As data volume and business complexity grow, databases alone cannot satisfy rich query and analysis needs.
3.10 Ninth Evolution: Introduce NoSQL and Search Engines
Adopt HDFS for large file storage, HBase/Redis for key‑value stores, and Elasticsearch for full‑text search; use Kylin or Druid for multidimensional analysis.
Adding more components increases system complexity and operational overhead.
3.11 Tenth Evolution: Split Monolithic Application into Smaller Services
Divide the codebase by business modules, allowing independent development and deployment.
Shared modules duplicated across applications make coordinated upgrades difficult.
3.12 Eleventh Evolution: Extract Reusable Functions into Microservices
Common functionalities (e.g., user management, order processing) become independent services accessed via HTTP, TCP, or RPC.
Frameworks such as Dubbo and Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.
Different services use different access protocols, making integration complex.
3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access
ESB abstracts protocol differences, allowing applications and services to communicate uniformly.
As services proliferate, deployment and environment conflicts increase operational difficulty.
3.14 Thirteenth Evolution: Containerization (Docker & Kubernetes)
Package applications as Docker images and orchestrate them with Kubernetes for dynamic scaling and isolated runtime environments.
Containers solve scaling but still require on‑premise hardware, leading to underutilized resources outside peak periods.
3.15 Fourteenth Evolution: Move to Cloud Platforms
Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing operational cost and enabling on‑demand scaling.
IaaS: Elastic compute, storage, network.
PaaS: Managed middleware and development platforms.
SaaS: Ready‑to‑use applications.
While cloud solves hardware elasticity, challenges like cross‑region data sync and distributed transactions remain.
4. Architecture Design Summary
Key design principles include N+1 redundancy, rollback capability, feature toggles, built‑in monitoring, multi‑active data centers, mature technology adoption, resource isolation, horizontal scalability, purchasing non‑core components, using commercial hardware, rapid iteration, and stateless service design.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
