Operations 18 min read

Taobao’s Scaling Secrets: Stateless Sessions, Caching, Service Splitting & Sharding

This article explains how Taobao achieves horizontal scalability by adopting stateless session handling, efficient client‑side cookie storage, multi‑level caching, service splitting with HSF, database sharding via TDDL, asynchronous messaging, unstructured data storage, and comprehensive monitoring and configuration management.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Taobao’s Scaling Secrets: Stateless Sessions, Caching, Service Splitting & Sharding

1. Stateless Application (Taobao Session Framework)

System scalability depends on how application state is managed. Storing large state in session means if a server fails, the state is lost. Clustering with session replication (Tomcat, JBoss) adds failover but increases communication overhead, hurting horizontal scaling. Therefore applications should be stateless so each node is identical.

Taobao’s session framework uses client‑side cookies to store state, eliminating server‑side session data. It employs multi‑value cookies to bypass the 20‑cookie limit and 4 KB size restriction, saving space by reducing per‑cookie metadata.

Alternatively, a centralized session server can store sessions in cache with persistent storage such as databases or file systems.

2. Effective Caching (Tair)

Caching is essential to reduce disk I/O, network latency, and keep up with CPU speed. Local and remote caches exist; mixing them complicates consistency. Read caches are common, but write caches can reduce database load for low‑write, low‑safety data by writing to memory first and persisting later.

3. Service Splitting (HSF)

As user numbers grow, monolithic systems become hard to maintain and scale. Splitting based on business relevance creates independent subsystems that can be scaled horizontally without affecting others, improving availability and reducing coupling.

Communication between subsystems can be synchronous or asynchronous. Taobao uses its high‑performance remote‑call framework HSF for synchronous calls, while asynchronous messaging is handled by a Notify system.

Vertical splitting (business, core, infrastructure services) and horizontal splitting (by function) are illustrated in Taobao’s architecture evolution from V2.2 to V3.0.

4. Database Sharding (TDDL)

Initially a single database handles all data, but growing traffic requires master‑slave read/write separation, vertical partitioning (different databases for users, products, orders), and horizontal partitioning (sharding tables) to distribute load.

Sharding introduces challenges such as cross‑shard joins and load balancing, which Taobao addresses with a Data Access Layer (DAL) framework TDDL that abstracts storage details.

5. Asynchronous Communication (Notify)

Asynchronous messaging decouples services, improves scalability and availability, and reduces response time. Taobao’s Notify middleware enables trade systems to continue operating even when dependent services are unavailable.

6. Unstructured Data Storage (TFS, NoSQL)

Non‑relational data (config files, user snapshots, large static files) is stored in a distributed file system (TFS) or NoSQL stores using BASE consistency to favor availability over strict consistency.

7. Monitoring and Alerting

Comprehensive monitoring of system metrics (CPU, memory, I/O, traffic) at both coarse and fine granularity, combined with alerting, helps detect anomalies quickly and maintain stability.

8. Unified Configuration Management

A centralized configuration service ensures consistent settings across nodes, simplifying addition or removal of servers and reducing configuration errors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringcachingstatelessService Splitting
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.