Backend Development 20 min read

How Taobao Scaled to Millions of Concurrent Users: Architecture Evolution

This article walks through Taobao’s journey from a single‑server setup to a cloud‑native, micro‑service architecture capable of handling tens of millions of concurrent requests, explaining each scaling step, the technologies involved, and key design principles for high‑availability systems.

ITFLY8 Architecture Home

Mar 19, 2022

How Taobao Scaled to Millions of Concurrent Users: Architecture Evolution

Overview

This article uses Taobao as a case study to illustrate the evolution of server‑side architecture from handling a few hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing architectural design principles.

Basic Concepts

Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.

High Availability : System continues to provide service when some nodes fail.

Cluster : A group of servers offering a unified service, with automatic failover.

Load Balancing : Evenly distributing requests across multiple nodes.

Forward and Reverse Proxy : Forward proxy handles outbound traffic from internal systems; reverse proxy forwards inbound traffic to internal servers.

Architecture Evolution

1. Single‑Machine Architecture

Initially, Tomcat and the database run on the same server; as user count grows, resource contention appears.

2. First Evolution: Separate Tomcat and Database

Deploy Tomcat and the database on separate servers, improving performance, but database read/write becomes the new bottleneck.

3. Second Evolution: Local and Distributed Caching

Introduce local cache (e.g., memcached) and distributed cache (Redis) to offload most read traffic from the database, addressing cache consistency, penetration, breakdown, and avalanche issues.

4. Third Evolution: Reverse Proxy Load Balancing

Deploy multiple Tomcat instances behind a reverse proxy (Nginx or HAProxy) to distribute traffic, increasing concurrency but pushing the bottleneck to the database.

5. Fourth Evolution: Database Read/Write Splitting

Separate read and write databases using middleware such as Mycat, synchronizing writes to multiple read replicas and handling data consistency.

6. Fifth Evolution: Business‑Based Database Sharding

Allocate different business data to separate databases, reducing contention; large tables are split into smaller ones, often using Mycat for routing.

7. Sixth Evolution: Table Partitioning

Split massive tables (e.g., comments, payment logs) by hash or time, enabling horizontal scaling; MPP databases such as TiDB, Greenplum, PostgreSQL‑XC, etc., provide the underlying engine.

8. Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

Introduce Layer‑4 load balancers (LVS software or F5 hardware) to balance traffic across multiple Nginx clusters, adding keepalived for high availability.

9. Eighth Evolution: DNS Round‑Robin Across Data Centers

Configure DNS to return multiple IPs for a domain, directing users to different data centers for global load balancing.

10. Ninth Evolution: NoSQL and Search Engines

Adopt HDFS for file storage, HBase/Redis for key‑value, ElasticSearch for full‑text search, and analytical engines like Kylin or Druid to handle massive data and diverse query patterns.

11. Tenth Evolution: Split Large Application into Smaller Services

Divide the monolith by business domain, using Zookeeper for distributed configuration.

12. Eleventh Evolution: Extract Reusable Functions as Micro‑services

Isolate common functionalities (user management, order, payment, authentication) into independent services using Dubbo, Spring Cloud, etc., with service governance features.

13. Twelfth Evolution: Enterprise Service Bus (ESB)

Introduce an ESB to unify protocol conversion and reduce coupling, forming a Service‑Oriented Architecture (SOA) that overlaps with micro‑service concepts.

14. Thirteenth Evolution: Containerization

Adopt Docker for packaging services and Kubernetes for orchestration, enabling dynamic scaling and isolation of runtime environments.

15. Fourteenth Evolution: Cloud Platform Adoption

Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing operational cost and enabling on‑demand scaling during peak events.

Architecture Design Summary

Architecture evolution does not have to follow a strict linear path; teams should address the most pressing bottlenecks first. Design should meet current performance goals while leaving room for future expansion. Key principles include N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑data‑center active‑active setups, mature technology adoption, resource isolation, horizontal scalability, buying non‑core components, using commercial hardware, rapid iteration, and stateless service design.

cloud computing microservices Scalability load balancing database sharding

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.