How Taobao Scaled Its Backend Architecture Over Time

This article outlines Taobao's learning objectives, traces the evolution of its backend architecture from V1.0 to V3.0, highlights the technical challenges faced at each stage, and explains the architectural decisions—such as modularization, service‑oriented frameworks, distributed storage, and large‑scale monitoring—that enabled massive scalability, reliability, and performance improvements.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Taobao Scaled Its Backend Architecture Over Time

Learning Objectives

Understand the requirements behind Taobao's architecture.

Learn the evolution of Taobao's technology stack.

Grasp basic architectural principles.

Taobao Related Data

Taobao Data
Taobao Data

Taobao Architecture Version Evolution

V1.0 Architecture (2003.05‑2004.01)

Database capacity limits.

Data stability issues.

Database performance bottlenecks.

V1.1 Architecture (2004.01‑2004.05)

Low development efficiency.

Lack of technical accumulation.

Inability to support parallel development.

Difficulty maintaining long‑term code.

Insufficient scalability for rapid business growth.

V2.0 Architecture (2004.02‑2005.03) – EJB Core

Based on Apache Turbine.

Modular (car) design.

Pipeline‑based architecture.

Page layout control.

Template rendering (JSP, Velocity, Freemarker).

Project management tool AntX (similar to Maven, Ant++, AutoConfig).

V2.1 Architecture (2004.10‑2007.01) – Spring Core

TBStore: high‑speed key‑value cache.

Centralized cache storage.

Multiple cache strategies.

Taobao CDN for static content.

Distributed file storage (TFS) and cache (Tair).

Page cache (ESI).

Search engine upgrade.

Distributed storage (TDFS) and search.

Database connection limits.

V2.2 Architecture (2006.06‑2007.12)

Improved system performance.

Reduced storage cost.

Support for massive data search.

Distributed storage TFS.

Data compression and deduplication.

Oracle → MySQL migration.

Message system Notify (topic‑based, transactional, 2 billion messages/day, 99.99% delivery).

V3.0 Architecture (2007.12‑present)

Service‑oriented framework HSF.

Centralized/productized management.

70 billion requests per day.

Same‑city and cross‑region disaster recovery.

HubAgent: lightweight HTTP service for monitoring, supporting incremental data, external scripts, process/thread detection, port probing, etc.

Distributed analysis system using MapReduce (master‑slave, horizontal scaling).

Cassandra for distributed monitoring data storage (350 GB total, 25 GB daily growth).

Monitoring system Hubble with 3500+ agents, handling ~4 TB raw data daily.

Architectural Principles

Upper layers depend on lower layers, not vice‑versa.

No circular dependencies; peers may depend but should be avoided.

Use messaging to decouple system dependencies.

Simplicity is key.

Service‑Oriented / Centralized Business System Architecture

Service Oriented Architecture
Service Oriented Architecture
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsmonitoringarchitectureBig DataScalability
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.