How Taobao Scaled from 100 to Millions of Concurrent Users: A Step‑by‑Step Architecture Evolution

This article uses Taobao as a case study to illustrate how a web service evolves from a single‑machine setup to a cloud‑native, micro‑service architecture capable of handling tens of millions of concurrent requests, detailing each technical milestone and the principles behind the design choices.

21CTO
21CTO
21CTO
How Taobao Scaled from 100 to Millions of Concurrent Users: A Step‑by‑Step Architecture Evolution

1. Overview

This article uses Taobao as an example to describe the evolution of server‑side architecture from a hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles.

2. Basic Concepts

Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.

High Availability : The system continues to provide service when some nodes fail.

Cluster : A group of servers providing a unified service, with automatic failover.

Load Balancing : Distributing incoming requests evenly across multiple nodes.

Forward and Reverse Proxy : Forward proxy lets internal systems access external networks; reverse proxy forwards external requests to internal servers.

3. Architecture Evolution

3.1 Single‑Machine Architecture

Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource competition makes this setup insufficient.

Single‑Machine Architecture Diagram
Single‑Machine Architecture Diagram

3.2 First Evolution: Separate Tomcat and Database

Tomcat and the database each occupy dedicated servers, significantly improving performance, but database read/write becomes the bottleneck as traffic increases.

Separate Tomcat and Database Diagram
Separate Tomcat and Database Diagram

3.3 Second Evolution: Local and Distributed Caching

Introduce local cache (e.g., memcached) and distributed cache (e.g., Redis) to store hot items and HTML pages, reducing database load. Issues such as cache consistency, penetration, and avalanche are addressed.

Caching Architecture Diagram
Caching Architecture Diagram

3.4 Third Evolution: Reverse Proxy Load Balancing

Deploy multiple Tomcat instances behind a reverse proxy (Nginx or HAProxy). This raises the concurrent capacity dramatically, but the database becomes the new bottleneck.

Reverse Proxy Load Balancing Diagram
Reverse Proxy Load Balancing Diagram

3.5 Fourth Evolution: Database Read/Write Separation

Separate the database into read replicas and a single write master, using middleware such as Mycat to synchronize data and handle sharding.

Read/Write Separation Diagram
Read/Write Separation Diagram

3.6 Fifth Evolution: Business‑Based Database Sharding

Store different business data in separate databases to reduce contention; high‑traffic services can be allocated more servers.

Business Sharding Diagram
Business Sharding Diagram

3.7 Sixth Evolution: Splitting Large Tables

Hash‑based routing splits large tables (e.g., comments, payments) into many smaller tables, enabling horizontal scaling. This leads to a distributed database architecture often implemented with Mycat.

Table Splitting Diagram
Table Splitting Diagram

Open‑source MPP databases such as Greenplum, TiDB, PostgreSQL‑XC, and commercial ones like GBase provide SQL‑compatible distributed query execution.

3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

When Nginx becomes a bottleneck, layer‑4 load balancers like LVS (software) or F5 (hardware) distribute traffic across many Nginx instances, with keepalived providing high availability.

LVS/F5 Load Balancing Diagram
LVS/F5 Load Balancing Diagram

3.9 Eighth Evolution: DNS Round‑Robin Across Data Centers

Configure DNS to return multiple IPs, each pointing to a different data‑center, achieving data‑center‑level load balancing and horizontal scaling to tens of millions of concurrent users.

DNS Round‑Robin Diagram
DNS Round‑Robin Diagram

3.10 Ninth Evolution: NoSQL and Search Engines

Introduce HDFS for file storage, HBase/Redis for key‑value data, Elasticsearch for full‑text search, and Kylin/Druid for multidimensional analysis to handle massive data and complex queries.

NoSQL and Search Engine Diagram
NoSQL and Search Engine Diagram

3.11 Tenth Evolution: Splitting Monolith into Small Applications

Divide the system by business domains, allowing independent deployment and scaling; shared configuration can be managed via Zookeeper.

Small Applications Diagram
Small Applications Diagram

3.12 Eleventh Evolution: Extracting Reusable Functions as Microservices

Common functionalities (user management, order, payment, authentication) become independent services accessed via HTTP, TCP, or RPC, using frameworks like Dubbo or Spring Cloud for governance.

Microservices Diagram
Microservices Diagram

3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access

ESB abstracts protocol differences, enabling applications and services to communicate uniformly, representing a SOA architecture that overlaps with microservices.

ESB Diagram
ESB Diagram

3.14 Thirteenth Evolution: Containerization

Docker packages applications into images; Kubernetes orchestrates dynamic deployment, enabling rapid scaling for peak events and isolation of runtime environments.

Docker & Kubernetes Diagram
Docker & Kubernetes Diagram

3.15 Fourteenth Evolution: Cloud Platform Adoption

Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing hardware costs and simplifying operations.

Cloud Platform Diagram
Cloud Platform Diagram

4. Architecture Design Summary

Architecture adjustments need not follow a fixed order; they should address the most pressing bottlenecks first.

Design depth depends on system goals: meet current performance targets while leaving room for future growth.

Service‑side architecture differs from big‑data architecture, which focuses on data ingestion, storage, and analysis.

Key design principles include N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑active data centers, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendarchitectureMicroservicesScalabilityhigh concurrencycloud
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.