Inside Taobao’s Billion-Request Engine: Load Balancing, CDN & Big Data

This article explains how Taobao scales to billions of daily page views using DNS‑based load balancing, LVS, domain sharding, CDN nodes, a distributed file system, sophisticated search processing, and massive data storage and real‑time log pipelines.

21CTO
21CTO
21CTO
Inside Taobao’s Billion-Request Engine: Load Balancing, CDN & Big Data

When you visit www.taobao.com, your browser first resolves the domain via DNS, which may return different IP addresses based on your region or ISP, implementing the first step of load balancing before any CDN is involved.

Each visit generates a Page View (PV) and a Unique Visitor (UV); Taobao’s daily PV reaches 1.6‑2.5 billion, far surpassing sites like 12306.cn.

Because the traffic is enormous, the homepage is generated by a farm of hundreds of servers, with request distribution handled by LVS (Linux Virtual Server), a widely used load‑balancing system.

After the HTML is delivered, the browser must load many resources (CSS, JS, images). Since browsers limit concurrent connections per domain, Taobao spreads resources across multiple domains to bypass this limit and prepare for CDN distribution.

During peak events like Double‑11, traffic can hit 871 GB/s, equivalent to 1.78 million 4 Mb residential lines, so Taobao relies on a nationwide CDN with dozens of nodes to serve static assets from the nearest location.

When sellers upload new product images, Taobao’s distributed file system TFS (Taobao File System) ensures that all CDN nodes synchronize these files quickly.

The search system first tokenizes the Chinese query using a word‑segmentation library, then analyzes shopping intent (browsing, query, comparison, or decisive) to tailor results.

Search results are produced by over a thousand search servers, and product detail snapshots are stored in Tair, Taobao’s proprietary distributed key‑value store.

All user actions generate massive logs (terabytes per day). Taobao uses TimeTunnel to transmit these logs in real time for downstream analytics.

Overall, Taobao stores petabytes of historical data, compressed at a 1:120 ratio, and processes it with a massive data‑processing cluster called “Cloud Ladder” consisting of more than 2,000 servers.

From this data, Taobao can infer detailed personal preferences as well as broad market trends, illustrating the immense scale and complexity of its backend infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemssearch engineload balancingCDN
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.