From 100 to 10 Million Users: How Taobao Scaled Its Backend Architecture

This article uses Taobao as a case study to trace the evolution of its server‑side architecture from a single‑machine setup to a cloud‑native, micro‑service ecosystem capable of handling tens of millions of concurrent users, highlighting key concepts, technologies, and design principles at each stage.

21CTO
21CTO
21CTO
From 100 to 10 Million Users: How Taobao Scaled Its Backend Architecture

1. Overview

This article uses Taobao as an example to illustrate the evolution of server‑side architecture from handling a hundred concurrent requests to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles at the end.

2. Basic Concepts

Before discussing the architecture, the following fundamental concepts are introduced:

Distributed : Multiple modules deployed on different servers, e.g., Tomcat and the database on separate machines.

High Availability : The system continues to provide service when some nodes fail.

Cluster : A group of servers offering a unified service, such as Zookeeper's master‑slave nodes.

Load Balancing : Evenly distributing incoming requests across multiple nodes.

Forward and Reverse Proxy : Forward proxy forwards internal requests to external networks; reverse proxy forwards external requests to internal servers.

3. Architecture Evolution

3.1 Single‑Machine Architecture

Single‑machine diagram
Single‑machine diagram

Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource contention between Tomcat and the database becomes a bottleneck.

With increasing users, Tomcat and database competition makes a single machine insufficient.

3.2 First Evolution: Separate Tomcat and Database

Separate Tomcat and DB diagram
Separate Tomcat and DB diagram

Tomcat and the database each occupy dedicated servers, significantly improving their individual performance.

Database read/write becomes the new bottleneck as user count rises.

3.3 Second Evolution: Introduce Local and Distributed Caches

Cache architecture diagram
Cache architecture diagram

Local caches are added on the Tomcat JVM, and a distributed cache (e.g., Redis) stores hot product data or HTML pages, intercepting most requests before they hit the database.

Technologies involved include Memcached for local caching, Redis for distributed caching, and challenges such as cache consistency, penetration, breakdown, and avalanche.

Caches handle most traffic, but the remaining load eventually slows down the single Tomcat.

3.4 Third Evolution: Reverse Proxy for Load Balancing

Nginx reverse proxy diagram
Nginx reverse proxy diagram

Multiple Tomcat instances are deployed and Nginx (or HAProxy) distributes requests evenly. Assuming each Tomcat handles 100 concurrent requests and Nginx 50 000, 500 Tomcats behind Nginx can support 50 000 concurrent users.

Technologies include Nginx, HAProxy, session sharing, and file upload/download handling.

Reverse proxy greatly raises application concurrency, but the database eventually becomes the bottleneck.

3.5 Fourth Evolution: Database Read‑Write Separation

Read‑write split diagram
Read‑write split diagram

The database is split into a write master and multiple read replicas; data is synchronized from the master to the reads. Caching can also provide the latest data when needed.

Technology highlighted: Mycat middleware for read/write separation and sharding, with data consistency considerations.

Different business workloads compete for the same database, affecting performance.

3.6 Fifth Evolution: Business‑Based Database Sharding

Business sharding diagram
Business sharding diagram

Data for different business domains is stored in separate databases, reducing resource contention. Cross‑business queries require additional solutions.

As users grow, the single write database eventually hits a performance ceiling.

3.7 Sixth Evolution: Split Large Tables into Small Tables

Table splitting diagram
Table splitting diagram

For example, comment data is hashed by product ID, and payment records are partitioned by hour, then further sharded by user ID. Small tables enable horizontal scaling.

This approach turns the database into a distributed system, often managed by middleware such as Mycat, and aligns with MPP (massively parallel processing) architectures.

Popular MPP databases include Greenplum, TiDB, PostgreSQL‑XC, HAWQ, GBase, SnowballDB, and Huawei LibrA, each focusing on OLTP or OLAP workloads.

Both Tomcat and the database can scale horizontally, but eventually the single Nginx becomes the bottleneck.

3.8 Seventh Evolution: LVS or F5 for Multi‑Nginx Load Balancing

LVS/F5 diagram
LVS/F5 diagram

LVS (software) and F5 (hardware) operate at layer 4, offering higher performance than Nginx. LVS can handle hundreds of thousands of concurrent connections; F5 provides even higher throughput at higher cost.

High availability is achieved with keepalived creating virtual IPs shared among multiple LVS instances.

When concurrency reaches hundreds of thousands, LVS itself becomes a bottleneck, and geographic latency becomes noticeable.

3.9 Eighth Evolution: DNS Round‑Robin for Inter‑Data‑Center Load Balancing

DNS round‑robin diagram
DNS round‑robin diagram

A DNS server maps a domain to multiple IPs, each pointing to a different data‑center. Clients receive one IP based on a round‑robin or other policy, achieving data‑center‑level load balancing.

As data volume and business complexity grow, databases alone cannot satisfy all query needs.

3.10 Ninth Evolution: Introduce NoSQL and Search Engines

NoSQL and search engine diagram
NoSQL and search engine diagram

When relational databases struggle with large‑scale analytics, specialized solutions are added: HDFS for file storage, HBase/Redis for key‑value stores, Elasticsearch for full‑text search, and Kylin/Druid for multidimensional analysis.

Adding components increases system complexity and introduces consistency and operational challenges.

More components expand capabilities but also make code maintenance and upgrades harder.

3.11 Tenth Evolution: Split Monolithic Application into Smaller Applications

Application splitting diagram
Application splitting diagram

Code is divided by business domain, allowing independent development and deployment. Shared configuration can be managed via a distributed configuration center such as Zookeeper.

Shared modules duplicated across applications increase upgrade effort.

3.12 Eleventh Evolution: Extract Reusable Functions as Microservices

Microservice extraction diagram
Microservice extraction diagram

Common functionalities like user management, order processing, and authentication become independent services accessed via HTTP, TCP, or RPC. Frameworks such as Dubbo or Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.

Different services use different access protocols, increasing integration complexity.

3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access

ESB diagram
ESB diagram

ESB abstracts protocol differences, allowing applications and services to communicate through a unified interface, reducing coupling. This resembles SOA architecture, which overlaps with microservices.

Growing numbers of applications and services make deployment and environment management increasingly difficult.

3.14 Thirteenth Evolution: Containerization for Isolation and Dynamic Management

Docker and Kubernetes diagram
Docker and Kubernetes diagram

Docker packages applications into images; Kubernetes orchestrates dynamic deployment, scaling, and resource isolation, simplifying operations especially during traffic spikes.

Even with containers, hardware resources still need to be owned and managed, leading to under‑utilization outside peak periods.

3.15 Fourteenth Evolution: Move to Cloud Platforms

Cloud platform diagram
Cloud platform diagram

Deploying to public cloud provides elastic resources; combined with Docker and Kubernetes, services can be provisioned on demand during promotions and released afterward, achieving cost efficiency.

Cloud service models include IaaS (infrastructure), PaaS (platform), and SaaS (software).

4. Architecture Design Summary

Must the architecture follow the exact evolution path? No. The sequence is illustrative; real scenarios may require addressing multiple bottlenecks simultaneously or prioritizing different concerns.

How detailed should the design be? For a one‑off system with clear performance targets, design to meet those targets while leaving room for future expansion. For continuously evolving platforms like e‑commerce, design for the next growth stage and iterate.

Difference between backend and big‑data architectures? Big‑data architecture focuses on data ingestion, storage, processing, and analysis (e.g., HDFS, Spark, HBase). Backend architecture deals with application organization and service delivery, often relying on big‑data components for underlying capabilities.

Key design principles:

Design for N+1 redundancy to avoid single points of failure.

Provide rollback mechanisms for safe upgrades.

Feature toggles for quick disabling of problematic components.

Integrate monitoring from the start.

Multi‑active data‑center deployment for high availability.

Prefer mature, battle‑tested technologies.

Isolate resources to prevent one business from monopolizing them.

Ensure horizontal scalability.

Buy non‑core solutions when development cost is high.

Use commercial‑grade hardware.

Adopt rapid iteration with small, testable features.

Design stateless service interfaces.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Microservicesload balancingdatabase shardingcloud architecturebackend scaling
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.