Evolution of Taobao Backend Architecture from Hundred to Ten Million Concurrent Users

This article uses Taobao’s backend as a case study to illustrate how server architecture evolves through ten stages—from a single‑machine setup handling hundreds of requests to a distributed, containerized, cloud‑native system supporting tens of millions of concurrent users—detailing the technologies and design principles at each step.

Architecture Digest
Architecture Digest
Architecture Digest
Evolution of Taobao Backend Architecture from Hundred to Ten Million Concurrent Users

The article presents a step‑by‑step evolution of Taobao’s backend architecture, starting from a simple single‑machine deployment and progressing to a highly scalable, cloud‑native system capable of handling tens of millions of concurrent users.

Basic Concepts : It first defines key terms such as distributed systems, high availability, clusters, load balancing, and forward/reverse proxies to ensure readers understand the fundamentals of large‑scale architecture.

Stage 1 – Single‑Machine Architecture : Initially, Tomcat and the database run on the same server, which quickly becomes a bottleneck as user traffic grows.

Stage 2 – Separate Tomcat and Database : Tomcat and the database are deployed on separate machines, improving resource isolation but exposing database read/write limits.

Stage 3 – Local and Distributed Caching : Introduces in‑process caching and external distributed caches (e.g., Memcached, Redis) to offload most read traffic from the database.

Stage 4 – Reverse Proxy Load Balancing : Deploys multiple Tomcat instances behind a reverse‑proxy layer (Nginx or HAProxy), distributing requests and dramatically increasing concurrent capacity.

Stage 5 – Database Read/Write Separation : Uses middleware such as Mycat to split write and read workloads across separate database instances, reducing contention.

Stage 6 – Business‑Level Database Sharding : Routes different business domains to dedicated databases, lowering cross‑business contention and enabling independent scaling.

Stage 7 – Table Splitting (Horizontal Partitioning) : Large tables are partitioned into smaller shards (e.g., by product ID or time), allowing horizontal scaling of storage and query processing.

Stage 8 – MPP Databases : Introduces massively parallel processing databases (Greenplum, TiDB, PostgreSQL‑XC, etc.) to handle analytical workloads at scale.

Stage 9 – Layer‑4 Load Balancing (LVS/F5) : Adds hardware/software load balancers operating at the transport layer to support hundreds of thousands of concurrent connections.

Stage 10 – DNS Round‑Robin Across Data Centers : Uses DNS to distribute traffic among multiple data‑center IPs, achieving geographic load balancing.

Stage 11 – NoSQL and Search Engines : Incorporates specialized stores (HBase, Redis) and search solutions (Elasticsearch) for high‑volume, schema‑flexible, and full‑text queries.

Stage 12 – Microservices Extraction : Extracts common functionalities (user management, order, payment, authentication) into independent services using frameworks like Dubbo or Spring Cloud, enabling independent development and deployment.

Stage 13 – Enterprise Service Bus (ESB) / SOA : Introduces an ESB to unify protocol conversion and service interaction, reducing coupling between services.

Stage 14 – Containerization : Packages services as Docker images and orchestrates them with Kubernetes, providing isolated runtime environments and dynamic scaling.

Stage 15 – Cloud Platform Adoption : Migrates the system to public cloud (IaaS, PaaS, SaaS), leveraging elastic resources, managed middleware, and on‑demand scaling for peak events like large sales.

The article concludes with a set of architectural design principles—such as N+1 redundancy, rollback capability, feature toggles, monitoring, multi‑active data centers, use of mature technologies, resource isolation, horizontal scalability, purchasing non‑core components, commercial hardware, rapid iteration, and stateless services—emphasizing that architecture should evolve iteratively based on real bottlenecks rather than following a fixed roadmap.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturecloud computingScalabilityhigh availability
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.