Backend Development 22 min read

From Single Server to Cloud Native: How Taobao Scaled to Millions of Concurrent Users

This article uses Taobao as a case study to trace the evolution of a high‑performance backend architecture from a single‑machine setup to a cloud‑native, micro‑service ecosystem, highlighting the technical challenges and design principles at each scaling stage.

Java High-Performance Architecture

Jul 8, 2022

From Single Server to Cloud Native: How Taobao Scaled to Millions of Concurrent Users

1. Overview

This article takes Taobao as an example to illustrate the evolution of server‑side architecture from a hundred concurrent requests to tens of millions, listing the technologies encountered at each stage and summarizing key design principles at the end.

2. Basic Concepts

Before discussing architecture, the following fundamental concepts are introduced:

Distributed : Multiple modules deployed on different servers, e.g., Tomcat and database on separate machines.

High Availability : When some nodes fail, others take over to continue providing service.

Cluster : A group of servers that together provide a service, such as Zookeeper's master‑slave nodes.

Load Balancing : Requests are evenly distributed across multiple nodes.

Forward and Reverse Proxy : Forward proxy acts on behalf of internal systems to access external networks; reverse proxy forwards external requests to internal servers.

3. Architecture Evolution

Single‑Machine Architecture

In the early days, Tomcat and the database were deployed on the same server. A browser request to www.taobao.com first resolves the domain via DNS to an IP (e.g., 10.102.4.1) and then reaches the Tomcat instance.

Architecture bottleneck: As user count grows, Tomcat and the database compete for resources, and a single machine cannot sustain the load.

First Evolution: Separate Tomcat and Database

Tomcat and the database each occupy their own server, significantly improving performance of both.

Architecture bottleneck: Database read/write becomes the new limiting factor as concurrency rises.

Second Evolution: Introduce Local and Distributed Caches

Local cache (e.g., memcached) is added within Tomcat/JVM, and a distributed cache (Redis) is deployed externally to store hot product data or HTML pages. This intercepts most requests before they hit the database, reducing pressure dramatically.

Architecture bottleneck: Cache handles most traffic, but the remaining load stresses the single Tomcat, causing response latency.

Third Evolution: Reverse Proxy for Load Balancing

Multiple Tomcat instances are deployed and a reverse‑proxy (Nginx or HAProxy) distributes requests evenly. Assuming each Tomcat handles 100 concurrent connections and Nginx 50,000, the system can theoretically support 50,000 concurrent users.

Architecture bottleneck: While application servers scale, the database becomes the next limiting factor.

Fourth Evolution: Database Read/Write Separation

The database is split into a write master and multiple read replicas. Tools such as Mycat provide middleware for read/write separation and sharding, with synchronization ensuring consistency.

Architecture bottleneck: Different business modules compete for database resources, affecting performance.

Fifth Evolution: Business‑Level Database Sharding

Data for each business line is stored in separate databases, reducing contention. Cross‑business queries require additional solutions, which are beyond the scope of this article.

Architecture bottleneck: The single write database eventually reaches its performance ceiling.

Sixth Evolution: Split Large Tables into Small Tables

Tables are hashed or time‑partitioned, routing rows to many small tables across multiple servers. This enables horizontal scaling of the database. MPP (Massively Parallel Processing) databases such as Greenplum, TiDB, PostgreSQL‑XC, and commercial solutions like GBase or LibrA provide the necessary capabilities.

Architecture bottleneck: After both application servers and databases scale horizontally, the Nginx layer becomes the next limiting factor.

Seventh Evolution: LVS/F5 for Multi‑Nginx Load Balancing

LVS (software) or F5 (hardware) operates at layer 4, offering higher throughput than Nginx. Keepalived can provide virtual IP failover for high availability.

Architecture bottleneck: A single LVS instance eventually caps at hundreds of thousands of concurrent connections, and geographic latency becomes noticeable.

Eighth Evolution: DNS Round‑Robin Across Data Centers

Multiple IPs are associated with a domain; DNS returns different IPs (each pointing to a different data center) using round‑robin or other policies, achieving data‑center‑level load balancing.

Architecture bottleneck: Richer data and business demands eventually outgrow pure relational databases.

Ninth Evolution: Introduce NoSQL and Search Engines

For massive data, solutions such as HDFS, HBase, Redis, Elasticsearch, Kylin, or Druid are adopted to handle key‑value storage, full‑text search, and multidimensional analytics.

Architecture bottleneck: Adding many components increases system complexity and operational overhead.

Tenth Evolution: Split Large Application into Smaller Services

Applications are divided by business domain, allowing independent development and deployment. Shared configuration can be managed via Zookeeper.

Architecture bottleneck: Duplicate code across applications makes coordinated upgrades difficult.

Eleventh Evolution: Extract Common Functions as Micro‑services

Functions such as user management, order processing, and authentication become independent services accessed via HTTP, TCP, or RPC. Frameworks like Dubbo or Spring Cloud provide service governance, rate limiting, circuit breaking, etc.

Architecture bottleneck: Diverse access protocols and inter‑service calls increase coupling and complexity.

Twelfth Evolution: Enterprise Service Bus (ESB) for Unified Access

ESB abstracts protocol conversion, allowing applications to call backend services uniformly and reducing coupling, similar to SOA architecture.

Architecture bottleneck: Rapid growth of services and components makes deployment and scaling increasingly difficult.

Thirteenth Evolution: Containerization

Docker packages applications into images; Kubernetes orchestrates dynamic deployment, scaling, and resource isolation, simplifying operations especially during traffic spikes.

Architecture bottleneck: Even with containers, the underlying hardware must still be provisioned, leading to under‑utilized resources outside peak periods.

Fourteenth Evolution: Cloud Platform Adoption

The system is migrated to a public cloud, leveraging IaaS for elastic compute, PaaS for common components, and SaaS for ready‑made services, achieving on‑demand resource allocation and cost efficiency.

4. Architecture Design Summary

Architecture adjustments need not follow a strict linear path; multiple bottlenecks may be addressed simultaneously.

Design depth should match system goals: a fixed‑scope project needs only enough architecture to meet performance targets, while a continuously evolving platform should anticipate future growth.

Service‑side architecture differs from big‑data architecture: the former focuses on application organization, the latter on data processing pipelines.

Key design principles include N+1 redundancy, rollback capability, feature toggles, built‑in monitoring, multi‑active data centers, mature technology adoption, horizontal scalability, purchasing non‑core components, using commercial hardware, rapid iteration, and stateless service interfaces.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend architecture scalability Caching Cloud

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.