From Single Server to Cloud‑Native: How Taobao Scaled to Millions of Concurrent Users

This article walks through Taobao's architectural evolution—from a single‑server setup to distributed clusters, caching, load balancing, microservices, containerization, and finally cloud platforms—illustrating the technologies and design principles needed to handle hundred‑to‑hundred‑million concurrent requests.

dbaplus Community
dbaplus Community
dbaplus Community
From Single Server to Cloud‑Native: How Taobao Scaled to Millions of Concurrent Users

Overview

The article uses Taobao as a case study to illustrate how a service‑side architecture evolves from handling a few hundred concurrent users to tens of millions, highlighting the technical challenges and solutions at each stage.

Basic Concepts

Distributed : Deploying multiple modules on different servers, e.g., Tomcat and databases on separate machines.

High Availability : Remaining operational when some nodes fail.

Cluster : A group of servers providing a unified service, with automatic failover.

Load Balancing : Evenly distributing incoming requests across multiple nodes.

Forward and Reverse Proxy : Forward proxy lets internal systems access external networks; reverse proxy forwards external requests to internal servers.

Architecture Evolution

1) Single‑Machine Architecture

Initially, Tomcat and the database run on the same server. Users access www.taobao.com which resolves to a single IP and reaches that Tomcat instance.

2) First Evolution – Separate Tomcat and Database

Tomcat and the database are deployed on separate servers, eliminating resource contention and improving performance.

3) Second Evolution – Add Local and Distributed Caches

Local cache (e.g., memcached) is added inside Tomcat, and a distributed cache (Redis) is introduced to store hot product data and HTML pages, dramatically reducing database load.

4) Third Evolution – Reverse Proxy for Load Balancing

Multiple Tomcat instances are deployed and Nginx (or HAProxy) distributes requests across them. Assuming each Tomcat handles 100 concurrent connections and Nginx 50,000, the system can theoretically support 50,000 concurrent users.

5) Fourth Evolution – Database Read/Write Separation

Writes go to a primary database, while reads are served by multiple replicas. Mycat is used as middleware to manage read/write splitting and sharding.

6) Fifth Evolution – Business‑Level Database Sharding

Different business domains store data in separate databases, reducing contention and allowing independent scaling.

7) Sixth Evolution – Split Large Tables

Large tables are partitioned (e.g., by product ID hash or hourly tables) and accessed via Mycat, enabling horizontal scaling of the database layer.

8) Seventh Evolution – LVS/F5 for Multi‑Level Load Balancing

LVS (software) or F5 (hardware) balances traffic across multiple Nginx instances. Keepalived provides virtual IP failover for high availability.

9) Eighth Evolution – DNS Round‑Robin Across Data Centers

DNS maps a domain to multiple IPs, each pointing to a different data‑center, achieving inter‑data‑center load balancing.

10) Ninth Evolution – NoSQL and Search Engines

When relational databases become a bottleneck for large‑scale analytics, technologies such as HDFS, HBase, MongoDB, ElasticSearch, Kylin, and Druid are introduced for storage, key‑value access, full‑text search, and multidimensional analysis.

11) Tenth Evolution – Split Monolith into Small Applications

Code is divided by business domain, allowing independent deployment and scaling. Shared configuration can be managed via Zookeeper.

12) Eleventh Evolution – Extract Reusable Functions as Microservices

Common functionalities (user management, order, payment, authentication) are isolated into independent services accessed via HTTP, TCP, or RPC. Frameworks such as Dubbo or Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.

13) Twelfth Evolution – Enterprise Service Bus (ESB)

ESB unifies protocol conversion and service invocation, reducing coupling. This architecture resembles SOA and overlaps with microservices concepts.

14) Thirteenth Evolution – Containerization

Docker packages applications into images; Kubernetes orchestrates them, enabling dynamic scaling and isolated runtime environments.

15) Fourteenth Evolution – Cloud Platform

The system is deployed on public cloud (IaaS/PaaS/SaaS). Resources are provisioned on demand, combined with Docker and Kubernetes for rapid scaling during traffic spikes and released afterward, achieving cost‑effective elasticity.

Architecture Design Summary

The evolution path is not mandatory; real‑world systems may address multiple bottlenecks simultaneously or follow a different order based on business needs.

For a one‑off system with clear performance targets, design enough to meet those targets while leaving hooks for future scaling. For continuously evolving platforms like e‑commerce sites, design for the next growth stage and iterate.

Service‑side architecture focuses on application organization, while big‑data architecture provides the underlying storage, processing, and analytics capabilities.

Design Principles

N+1 design – eliminate single points of failure.

Rollback design – ensure forward compatibility and ability to revert versions.

Feature toggle – configurable enable/disable of functions for rapid fault isolation.

Monitoring – embed observability from the start.

Active‑active data centers – achieve high availability across locations.

Use mature technologies – avoid untested or unsupported components.

Resource isolation – prevent one business from monopolizing resources.

Horizontal scalability – design for scale‑out to avoid bottlenecks.

Buy non‑core solutions – leverage commercial products for peripheral functions.

Commercial hardware – improve reliability.

Rapid iteration – develop small features quickly for early feedback.

Stateless services – keep service interfaces independent of prior requests.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsarchitectureMicroservicescloud
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.