How Taobao Scaled from 100 Users to Millions: A Step‑by‑Step Architecture Evolution
This article uses Taobao as a case study to trace the evolution of server‑side architecture from a single‑machine setup to a cloud‑native, micro‑service ecosystem, detailing each scaling stage, the associated technologies, and design principles needed to handle tens of millions of concurrent users.
Overview
This article illustrates a typical service‑side architecture evolution from a single‑machine deployment serving a few hundred users to a multi‑data‑center system handling tens of millions of concurrent requests. Each stage introduces specific technologies that address the current bottleneck.
Basic Concepts
Distributed – Modules run on different servers (e.g., Tomcat and database on separate machines).
High Availability – Failed nodes are automatically taken over by healthy ones.
Cluster – Multiple servers act as a single logical service.
Load Balancing – Requests are evenly distributed across nodes.
Forward/Reverse Proxy – Forward proxy enables internal systems to access external networks; reverse proxy forwards external requests to internal servers.
Architecture Evolution
1. Single‑Machine Architecture
Tomcat and the database run on the same server. As traffic grows, they compete for CPU, memory and I/O, making the single machine a performance bottleneck.
Tomcat and the database contend for resources; a single machine cannot sustain high concurrency.
2. Separate Tomcat and Database
Tomcat and the database each occupy dedicated servers, eliminating resource contention between them.
Database read/write becomes the new bottleneck as concurrency increases.
3. Add Local and Distributed Cache
Local caches (e.g., memcached) are added inside the JVM, and an external distributed cache ( Redis) stores hot product data or HTML pages. Most read requests are served from cache, dramatically reducing DB load.
Key concerns: cache consistency, cache penetration, cache avalanche, hot‑data expiration.
Cache absorbs most traffic, but Tomcat becomes the next performance bottleneck.
4. Reverse Proxy for Load Balancing
Multiple Tomcat instances are deployed behind an Nginx reverse proxy (or HAProxy). Assuming each Tomcat handles ~100 concurrent connections and Nginx can handle 50 000, the system can theoretically support 50 000 concurrent users.
Technologies: Nginx, HAProxy, session sharing, file upload handling.
Reverse proxy raises application‑server concurrency, but the database becomes the next bottleneck.
5. Database Read/Write Separation
The database is split into a single write master and multiple read replicas. Writes go to the master; reads are served by replicas. Synchronization mechanisms keep data consistent.
Middleware: Mycat for read/write splitting, sharding, and data synchronization.
Different business workloads compete for the database, affecting overall performance.
6. Business‑Level Sharding
Data for each business domain is stored in separate databases, reducing cross‑business contention. Large‑scale cross‑business queries require additional solutions.
Write database eventually hits its performance ceiling as user count grows.
7. Split Large Tables into Small Tables
Large tables are hash‑ or time‑partitioned (e.g., per hour) and routed to many small tables across multiple servers. Mycat supports table splitting, creating an MPP‑style architecture.
Typical MPP databases: Greenplum, TiDB, PostgreSQL‑XC, HAWQ (open source); GBase, SnowballDB, LibrA (commercial).
Both database and Tomcat can scale horizontally, but Nginx eventually becomes the bottleneck.
8. LVS/F5 for Multi‑Nginx Load Balancing
LVS (software) or F5 (hardware) operates at layer‑4 to balance traffic among multiple Nginx instances. LVS can handle hundreds of thousands of concurrent connections; F5 offers higher performance at higher cost. keepalived provides HA for LVS.
When concurrency reaches hundreds of thousands, LVS itself becomes a bottleneck, and latency varies across regions.
9. DNS Round‑Robin Across Data Centers
DNS is configured with multiple A records, each pointing to a virtual IP in a different data center. Users are directed to a data center via round‑robin or other policies, enabling horizontal scaling at the data‑center level.
A single database can no longer satisfy all business needs as data and complexity grow.
10. Introduce NoSQL and Search Engines
When relational databases struggle with massive data, specialized solutions are added:
HDFS for distributed file storage.
HBase/Redis for key‑value storage.
ElasticSearch for full‑text search.
Kylin/Druid for multidimensional analytics.
These components increase system complexity: data synchronization, consistency handling, and additional operational tooling are required.
More components expand capabilities but also raise operational complexity.
11. Split Monolith into Small Applications
Business domains are separated into independent applications with clear responsibilities. Shared configuration can be managed via Zookeeper.
Duplicated shared modules across applications make upgrades painful.
12. Extract Reusable Functions as Microservices
Common functionalities (user management, order, payment, authentication) are isolated into independent services accessed via HTTP, TCP, or RPC. Frameworks such as Dubbo and Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.
Different access protocols and inter‑service calls increase complexity and risk of tangled call chains.
13. Enterprise Service Bus (ESB) to Hide Interface Differences
ESB performs protocol conversion and unified access, allowing applications and services to communicate through a common bus, reducing coupling. This resembles SOA, which emphasizes unified interface handling.
SOA and microservices overlap; SOA focuses on unified access via a bus.
14. Containerization (Docker & Kubernetes)
Docker packages applications/services into images; Kubernetes orchestrates dynamic deployment, scaling, and management. Containers simplify environment isolation and enable rapid scaling for peak events.
Containers solve dynamic scaling but still require owned hardware, leading to idle resources outside peak periods.
15. Move to Cloud Platforms
The system is deployed on a public cloud, leveraging elastic resources for peak loads and releasing them afterward, achieving cost‑effective, on‑demand scaling.
IaaS – raw compute, storage, network.
PaaS – ready‑made technology stacks (e.g., Hadoop, MPP databases).
SaaS – fully built applications offered on a subscription basis.
All discussed solutions address high‑concurrency challenges, yet cross‑region data sync and distributed transactions remain open problems.
Architecture Design Summary
Adjustments need not follow a fixed order; solve the most pressing bottleneck first.
Design depth should meet current performance targets while leaving room for future growth.
Service‑side architecture differs from big‑data architecture, which focuses on data collection, storage, analysis, and services.
Key design principles:
N+1 redundancy – no single point of failure.
Rollback capability – ability to revert versions safely.
Feature toggles – configurable enable/disable of functions.
Monitoring – embed observability from the design phase.
Multi‑active data centers – ensure availability across locations.
Adopt mature, well‑supported technologies.
Resource isolation – prevent a single business from monopolizing resources.
Horizontal scalability – design for scale‑out rather than scale‑up.
Buy non‑core solutions when appropriate.
Use commercial‑grade hardware for reliability.
Rapid iteration – develop small features, validate early.
Stateless services – avoid reliance on previous request state.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Brother's Insights
A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
