Backend Development 23 min read

How Did Taobao’s Backend Architecture Evolve to Support Millions of Users?

Using Taobao as a case study, this article traces the step‑by‑step evolution of a high‑traffic backend—from a single‑machine setup through database sharding, caching, load balancing, microservices, containerization, and finally cloud deployment—highlighting the technologies and design principles that enable scaling to millions of concurrent users.

Open Source Linux

Apr 21, 2022

How Did Taobao’s Backend Architecture Evolve to Support Millions of Users?

Overall Architecture Evolution Path

Single‑machine architecture

First evolution: Separate Tomcat and database deployment

Second evolution: Introduce local and distributed caching

Third evolution: Add reverse proxy for load balancing

Fourth evolution: Database read/write separation

Fifth evolution: Business‑based database sharding

Sixth evolution: Split large tables into smaller ones

Seventh evolution: Use LVS or F5 for multi‑Nginx load balancing

Eighth evolution: DNS round‑robin for inter‑data‑center load balancing

Ninth evolution: Introduce NoSQL databases and search engines

Tenth evolution: Split a large application into smaller applications

Eleventh evolution: Extract reusable functions into microservices

Twelfth evolution: Introduce Enterprise Service Bus (ESB) to mask service interface differences

Thirteenth evolution: Adopt containerization for environment isolation and dynamic service management

Fourteenth evolution: Host the system on a cloud platform

1. Overview

This article uses Taobao as an example to illustrate the evolution of a server‑side architecture from hundreds to tens of millions of concurrent users, listing the relevant technologies at each stage and summarizing architectural design principles at the end.

Note: The example is for illustration only and does not reflect Taobao’s actual technical evolution.

2. Basic Concepts

Before discussing the architecture, the following fundamental concepts are introduced:

Distributed: Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.

High Availability: The system continues to provide service when some nodes fail.

Cluster: A group of servers providing a unified service, with automatic failover.

Load Balancing: Distributing requests evenly across multiple nodes.

Forward and Reverse Proxy: Forward proxy handles outbound requests from internal systems; reverse proxy forwards inbound external requests to internal servers.

3. Architecture Evolution

3.1 Single‑machine Architecture

Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource contention between Tomcat and the database becomes a bottleneck.

3.2 First Evolution: Separate Tomcat and Database

Tomcat and the database each occupy dedicated servers, significantly improving performance. However, concurrent database reads/writes soon become the new bottleneck.

3.3 Second Evolution: Local and Distributed Caching

Introduce local cache (e.g., memcached) and distributed cache (e.g., Redis) to store hot product data or HTML pages, intercepting most requests before they hit the database. Issues such as cache consistency, penetration, breakdown, and hot‑data expiration are addressed.

3.4 Third Evolution: Reverse Proxy for Load Balancing

Deploy multiple Tomcat instances and use a reverse‑proxy (Nginx or HAProxy) to distribute requests. This greatly increases the supported concurrency, but the database soon becomes the limiting factor.

3.5 Fourth Evolution: Database Read/Write Separation

Separate the database into read and write instances; multiple read replicas synchronize from the primary write database. Middleware such as Mycat manages read/write routing and sharding.

3.6 Fifth Evolution: Business‑Based Database Sharding

Store different business data in separate databases to reduce contention. This introduces cross‑business query challenges.

3.7 Sixth Evolution: Split Large Tables into Small Tables

Hash‑based routing splits tables (e.g., comments by product ID, payments by hour). Horizontal scaling is achieved, but operational complexity rises. MPP databases such as Greenplum, TiDB, and PostgreSQL‑XC are introduced.

3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

LVS (software) or F5 (hardware) operate at layer 4, providing higher performance and protocol support than Nginx. High availability is achieved with keepalived and virtual IPs.

3.9 Eighth Evolution: DNS Round‑Robin for Inter‑Data‑Center Load Balancing

Configure a domain to resolve to multiple IPs, each pointing to a different data‑center, achieving geographic load balancing and horizontal scaling to tens of millions of concurrent users.

3.10 Ninth Evolution: Introduce NoSQL and Search Engines

Adopt HDFS for file storage, HBase/Redis for key‑value, ElasticSearch for full‑text search, and Kylin/Druid for multidimensional analysis, increasing system complexity and consistency challenges.

3.11 Tenth Evolution: Split Large Application into Smaller Applications

Divide code by business modules, allowing independent upgrades. Shared configuration can be managed via Zookeeper.

3.12 Eleventh Evolution: Extract Reusable Functions as Microservices

Common functionalities (user management, order, payment, authentication) become independent services accessed via HTTP, TCP, or RPC. Frameworks such as Dubbo or Spring Cloud provide governance, rate limiting, circuit breaking, and degradation.

3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Interface Unification

ESB abstracts protocol differences, allowing applications to access backend services uniformly and reducing coupling, similar to SOA architecture.

3.14 Thirteenth Evolution: Containerization (Docker & Kubernetes)

Package services as Docker images and deploy them dynamically with Kubernetes, simplifying scaling and resource isolation.

3.15 Fourteenth Evolution: Cloud Platform Hosting

Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reduce operational costs, and achieve on‑demand scaling.

4. Architecture Design Summary

Must the architecture follow the exact evolution path? No; real scenarios may require addressing multiple bottlenecks simultaneously.

How detailed should the design be? Design to meet current performance targets while leaving room for future expansion.

Difference between backend and big‑data architectures? Big‑data architecture focuses on data collection, storage, and analysis (e.g., HDFS, Spark), whereas backend architecture concerns application organization and service delivery.

Design Principles

N+1 design: No single point of failure.

Rollback design: Ensure forward compatibility and version rollback.

Feature toggle design: Ability to disable features quickly.

Monitoring design: Incorporate observability from the start.

Multi‑active data‑center design for high availability.

Adopt mature technologies to avoid hidden bugs.

Resource isolation to prevent one business from monopolizing resources.

Horizontal scalability as a core requirement.

Buy non‑core components when development cost is high.

Use commercial hardware for reliability.

Rapid iteration: Deploy small features quickly for validation.

Stateless service design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture microservices scalability Cloud

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.