How Taobao Scaled Its Backend Architecture Over Time
This article outlines Taobao's learning objectives, traces the evolution of its backend architecture from V1.0 to V3.0, highlights the technical challenges faced at each stage, and explains the architectural decisions—such as modularization, service‑oriented frameworks, distributed storage, and large‑scale monitoring—that enabled massive scalability, reliability, and performance improvements.
Learning Objectives
Understand the requirements behind Taobao's architecture.
Learn the evolution of Taobao's technology stack.
Grasp basic architectural principles.
Taobao Related Data
Taobao Architecture Version Evolution
V1.0 Architecture (2003.05‑2004.01)
Database capacity limits.
Data stability issues.
Database performance bottlenecks.
V1.1 Architecture (2004.01‑2004.05)
Low development efficiency.
Lack of technical accumulation.
Inability to support parallel development.
Difficulty maintaining long‑term code.
Insufficient scalability for rapid business growth.
V2.0 Architecture (2004.02‑2005.03) – EJB Core
Based on Apache Turbine.
Modular (car) design.
Pipeline‑based architecture.
Page layout control.
Template rendering (JSP, Velocity, Freemarker).
Project management tool AntX (similar to Maven, Ant++, AutoConfig).
V2.1 Architecture (2004.10‑2007.01) – Spring Core
TBStore: high‑speed key‑value cache.
Centralized cache storage.
Multiple cache strategies.
Taobao CDN for static content.
Distributed file storage (TFS) and cache (Tair).
Page cache (ESI).
Search engine upgrade.
Distributed storage (TDFS) and search.
Database connection limits.
V2.2 Architecture (2006.06‑2007.12)
Improved system performance.
Reduced storage cost.
Support for massive data search.
Distributed storage TFS.
Data compression and deduplication.
Oracle → MySQL migration.
Message system Notify (topic‑based, transactional, 2 billion messages/day, 99.99% delivery).
V3.0 Architecture (2007.12‑present)
Service‑oriented framework HSF.
Centralized/productized management.
70 billion requests per day.
Same‑city and cross‑region disaster recovery.
HubAgent: lightweight HTTP service for monitoring, supporting incremental data, external scripts, process/thread detection, port probing, etc.
Distributed analysis system using MapReduce (master‑slave, horizontal scaling).
Cassandra for distributed monitoring data storage (350 GB total, 25 GB daily growth).
Monitoring system Hubble with 3500+ agents, handling ~4 TB raw data daily.
Architectural Principles
Upper layers depend on lower layers, not vice‑versa.
No circular dependencies; peers may depend but should be avoided.
Use messaging to decouple system dependencies.
Simplicity is key.
Service‑Oriented / Centralized Business System Architecture
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
