How Taobao’s CDN Architecture Boosts Performance and Cuts Costs
This article explains the fundamentals of CDN technology, describes Taobao’s multi‑level caching and load‑balancing design, and shows how edge nodes, LVS, HAProxy, and tiered storage together improve latency, reduce server load, and lower operational costs.
CDN, short for Content Delivery Network, delivers web content to edge nodes that are geographically close to users, allowing pages, images, or videos to be fetched locally. This reduces server load, balances network traffic, cuts hardware, energy, and management costs, and improves overall network performance.
An “edge node” is a server selected by the CDN provider that is nearest to the user, often just one hop away, which shortens access time because the request does not traverse multiple routers.
When a domain that uses CDN is resolved, an intelligent load‑balancing system selects an edge node’s IP. The user accesses this IP, the edge node resolves the origin server’s IP via its internal DNS, fetches the required resource, caches it, and serves subsequent requests directly from the cache.
CDN Architecture
CDNs can be third‑party services or self‑built platforms. This article uses Taobao’s CDN as an example.
Taobao’s CDN primarily supports massive image traffic during events such as Double 11. Images are stored in a backend TFS cluster, and the CDN caches them on the nearest edge nodes.
The CDN employs a two‑level cache (L1 and L2). When a user requests an image, the global scheduler directs the request to an L1 cache node. If the L1 cache hits, the image is returned immediately; otherwise the request falls back to an L2 cache, whose result is then stored in L1. If both caches miss, the request reaches the origin image server cluster.
The origin cluster consists of Nginx web servers that also maintain local caches; if those miss, the request is forwarded to the backend TFS cluster. All image servers and the TFS cluster are deployed within the same data center.
Each CDN node uses LVS + HAProxy for load balancing, optionally with Keepalived for high availability.
LVS is a layer‑4 load balancer that can implement flexible balancing strategies. It works together with Squid servers to distribute image requests.
LVS typically consists of three types of machines: VIP (virtual IP) that receives external traffic, a master node, and a backup node. Health checks forward traffic to a healthy HAProxy instance for layer‑7 forwarding.
HAProxy performs layer‑7 load balancing, with optimizations such as long‑connection support and consistent‑hash routing based on URL.
HAProxy optimization – support for long connections
HAProxy scheduling algorithm – consistent hashing based on request URL
Each image server essentially runs a Squid cache for binary image data. Consistent hashing distributes data across Squid servers, so adding or removing a server moves only 1/n of the objects.
Summary
CDN is fundamentally a distributed caching system that does not require data persistence; if a cache server fails, it can be removed from the cluster.
Taobao’s CDN uses tiered storage on Squid servers: SSD for the hottest images, SAS for medium‑hot, and SATA for less‑hot data, balancing cost and performance. As SSD prices drop, many CDN nodes are now equipped with SSDs.
Because CDN caching is I/O‑intensive rather than CPU‑intensive, using low‑power ATOM chips helps reduce overall power consumption.
Operators must monitor cache consistency: when the origin server updates or deletes content, the changes need to be propagated to CDN edge nodes in near real‑time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
