How Taobao Scaled to Millions of Concurrent Users: Architecture Evolution

This article walks through Taobao’s journey from a single‑server setup to a cloud‑native, micro‑service architecture capable of handling tens of millions of concurrent requests, explaining each scaling step, the technologies involved, and key design principles for high‑availability systems.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Taobao Scaled to Millions of Concurrent Users: Architecture Evolution

Overview

This article uses Taobao as a case study to illustrate the evolution of server‑side architecture from handling a few hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles.

Basic Concepts

Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.

High Availability : System continues to provide service when some nodes fail.

Cluster : A group of servers offering a unified service, with automatic failover.

Load Balancing : Evenly distributing requests across multiple nodes.

Forward and Reverse Proxy : Forward proxy handles outbound traffic from internal systems; reverse proxy forwards inbound traffic to internal servers.

Architecture Evolution

1. Single‑Machine Architecture

Initially, Tomcat and the database run on the same server; as user count grows, resource contention appears.

Single‑machine architecture diagram
Single‑machine architecture diagram

2. First Evolution: Separate Tomcat and Database

Deploy Tomcat and the database on separate servers, improving performance, but database read/write becomes the new bottleneck.

Separate Tomcat and DB diagram
Separate Tomcat and DB diagram

3. Second Evolution: Local and Distributed Caching

Introduce local cache (e.g., memcached) and distributed cache (Redis) to offload most read traffic from the database, addressing cache consistency, penetration, breakdown, and avalanche issues.

Caching architecture diagram
Caching architecture diagram

4. Third Evolution: Reverse Proxy Load Balancing

Deploy multiple Tomcat instances behind a reverse proxy (Nginx or HAProxy) to distribute traffic, increasing concurrency but pushing the bottleneck to the database.

Reverse proxy load balancing diagram
Reverse proxy load balancing diagram

5. Fourth Evolution: Database Read/Write Splitting

Separate read and write databases using middleware such as Mycat, synchronizing writes to multiple read replicas and handling data consistency.

Read/write split diagram
Read/write split diagram

6. Fifth Evolution: Business‑Based Database Sharding

Allocate different business data to separate databases, reducing contention; large tables are split into smaller ones, often using Mycat for routing.

Business sharding diagram
Business sharding diagram

7. Sixth Evolution: Table Partitioning

Split massive tables (e.g., comments, payment logs) by hash or time, enabling horizontal scaling; MPP databases such as TiDB, Greenplum, PostgreSQL‑XC, etc., provide the underlying engine.

Table partitioning diagram
Table partitioning diagram

8. Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

Introduce Layer‑4 load balancers (LVS software or F5 hardware) to balance traffic across multiple Nginx clusters, adding keepalived for high availability.

LVS/F5 load balancing diagram
LVS/F5 load balancing diagram

9. Eighth Evolution: DNS Round‑Robin Across Data Centers

Configure DNS to return multiple IPs for a domain, directing users to different data centers for global load balancing.

DNS round‑robin diagram
DNS round‑robin diagram

10. Ninth Evolution: NoSQL and Search Engines

Adopt HDFS for file storage, HBase/Redis for key‑value, ElasticSearch for full‑text search, and analytical engines like Kylin or Druid to handle massive data and diverse query patterns.

NoSQL and search engine diagram
NoSQL and search engine diagram

11. Tenth Evolution: Split Large Application into Smaller Services

Divide the monolith by business domain, using Zookeeper for distributed configuration.

Application splitting diagram
Application splitting diagram

12. Eleventh Evolution: Extract Reusable Functions as Micro‑services

Isolate common functionalities (user management, order, payment, authentication) into independent services using Dubbo, Spring Cloud, etc., with service governance features.

Micro‑service extraction diagram
Micro‑service extraction diagram

13. Twelfth Evolution: Enterprise Service Bus (ESB)

Introduce an ESB to unify protocol conversion and reduce coupling, forming a Service‑Oriented Architecture (SOA) that overlaps with micro‑service concepts.

ESB architecture diagram
ESB architecture diagram

14. Thirteenth Evolution: Containerization

Adopt Docker for packaging services and Kubernetes for orchestration, enabling dynamic scaling and isolation of runtime environments.

Docker/Kubernetes diagram
Docker/Kubernetes diagram

15. Fourteenth Evolution: Cloud Platform Adoption

Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing operational cost and enabling on‑demand scaling during peak events.

Cloud platform diagram
Cloud platform diagram

Architecture Design Summary

Architecture evolution does not have to follow a strict linear path; teams should address the most pressing bottlenecks first. Design should meet current performance goals while leaving room for future expansion. Key principles include N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑data‑center active‑active setups, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless service design.

cloud computingmicroservicesScalabilityload balancingdatabase sharding
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.