Backend Development 22 min read

From Single Server to Cloud‑Native: Taobao’s 14‑Step Architecture Evolution

This article traces Taobao's backend architecture evolution—from a single‑server setup to distributed clusters, caching, load balancing, database sharding, microservices, containerization, and finally cloud‑native deployment—highlighting the technologies and design principles that enable scaling from hundreds to millions of concurrent users.

21CTO

Jan 28, 2021

From Single Server to Cloud‑Native: Taobao’s 14‑Step Architecture Evolution

1. Overview

This article uses Taobao as an example to illustrate the evolution of server‑side architecture from a hundred concurrent users to tens of millions, listing the technologies encountered at each stage and summarizing key design principles.

2. Basic Concepts

Distributed : Deploying multiple modules on different servers, e.g., Tomcat and database on separate machines.

High Availability : When some nodes fail, others take over to continue providing service.

Cluster : A set of servers offering a unified service, such as Zookeeper's master‑slave configuration.

Load Balancing : Evenly distributing requests across multiple nodes to balance load.

Forward and Reverse Proxy : Forward proxy forwards internal requests to external networks; reverse proxy forwards external requests to internal servers.

3. Architecture Evolution

3.1 Single‑Machine Architecture

Initially, Tomcat and the database are deployed on the same server. As user numbers grow, resource contention makes this setup insufficient.

With increasing users, Tomcat and the database compete for resources, and a single machine cannot support the business.

3.2 First Evolution: Separate Tomcat and Database

Deploy Tomcat and the database on separate servers, significantly improving their individual performance.

As user numbers increase, concurrent reads/writes to the database become a bottleneck.

3.3 Second Evolution: Introduce Local and Distributed Caching

Add local cache in Tomcat/JVM and a distributed cache (e.g., Redis) to store hot product data or HTML pages, intercepting most requests before they hit the database.

Technologies include memcached for local cache, Redis for distributed cache, and handling cache consistency, penetration, avalanche, and hot‑data expiration.

Cache handles most traffic, but as users grow, the remaining load falls on a single Tomcat, slowing responses.

3.4 Third Evolution: Reverse Proxy for Load Balancing

Deploy multiple Tomcat instances and use Nginx (or HAProxy) as a reverse proxy to distribute requests evenly.

Technologies: Nginx, HAProxy, session sharing, file upload/download.

Reverse proxy greatly increases the concurrent capacity of application servers, but the database eventually becomes the bottleneck.

3.5 Fourth Evolution: Database Read‑Write Separation

Separate the database into read and write instances; multiple read replicas synchronize from the primary write node.

Technology: Mycat middleware for read/write separation and sharding.

Different services compete for the same database, causing performance interference.

3.6 Fifth Evolution: Business‑Based Database Sharding

Store data for different business domains in separate databases, reducing resource contention.

As users grow, the single write database eventually hits a performance ceiling.

3.7 Sixth Evolution: Split Large Tables into Small Tables

Hash‑based routing (e.g., by product ID) or time‑based partitioning (e.g., hourly tables) distributes data across many small tables, enabling horizontal scaling.

This approach increases DBA complexity and leads to distributed‑database architectures such as MPP.

Popular MPP solutions include Greenplum, TiDB, PostgreSQL‑XC, HAWQ, GBase, SnowballDB, and Huawei LibrA.

Both database and Tomcat can scale horizontally, but eventually Nginx becomes the bottleneck.

3.8 Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

Use LVS (software) or F5 (hardware) at layer 4 to balance traffic across multiple Nginx instances, supporting tens of thousands to hundreds of thousands of concurrent connections.

High availability is achieved with keepalived and virtual IPs.

When concurrency reaches hundreds of thousands, LVS itself becomes a bottleneck, and geographic latency differences emerge.

3.9 Eighth Evolution: DNS Round‑Robin for Inter‑Data‑Center Balancing

Configure DNS to return multiple IPs for a domain, each pointing to a different data‑center, achieving load balancing across regions.

As data volume and business complexity grow, databases alone cannot satisfy rich query and analysis needs.

3.10 Ninth Evolution: Introduce NoSQL and Search Engines

Adopt HDFS for large file storage, HBase/Redis for key‑value stores, and Elasticsearch for full‑text search; use Kylin or Druid for multidimensional analysis.

Adding more components increases system complexity and operational overhead.

3.11 Tenth Evolution: Split Monolithic Application into Smaller Services

Divide the codebase by business modules, allowing independent development and deployment.

Shared modules duplicated across applications make coordinated upgrades difficult.

3.12 Eleventh Evolution: Extract Reusable Functions into Microservices

Common functionalities (e.g., user management, order processing) become independent services accessed via HTTP, TCP, or RPC.

Frameworks such as Dubbo and Spring Cloud provide service governance, rate limiting, circuit breaking, and degradation.

Different services use different access protocols, making integration complex.

3.13 Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access

ESB abstracts protocol differences, allowing applications and services to communicate uniformly.

As services proliferate, deployment and environment conflicts increase operational difficulty.

3.14 Thirteenth Evolution: Containerization (Docker & Kubernetes)

Package applications as Docker images and orchestrate them with Kubernetes for dynamic scaling and isolated runtime environments.

Containers solve scaling but still require on‑premise hardware, leading to underutilized resources outside peak periods.

3.15 Fourteenth Evolution: Move to Cloud Platforms

Deploy the system on public cloud (IaaS, PaaS, SaaS) to leverage elastic resources, reducing operational cost and enabling on‑demand scaling.

IaaS: Elastic compute, storage, network.

PaaS: Managed middleware and development platforms.

SaaS: Ready‑to‑use applications.

While cloud solves hardware elasticity, challenges like cross‑region data sync and distributed transactions remain.

4. Architecture Design Summary

Key design principles include N+1 redundancy, rollback capability, feature toggles, built‑in monitoring, multi‑active data centers, mature technology adoption, resource isolation, horizontal scalability, purchasing non‑core components, using commercial hardware, rapid iteration, and stateless service design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Cloud Native Backend Architecture Microservices Scalability high concurrency

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.