Inside Taobao’s Billion-Request Engine: Load Balancing, CDN & Big Data
This article explains how Taobao scales to billions of daily page views using DNS‑based load balancing, LVS, domain sharding, CDN nodes, a distributed file system, sophisticated search processing, and massive data storage and real‑time log pipelines.
When you visit www.taobao.com, your browser first resolves the domain via DNS, which may return different IP addresses based on your region or ISP, implementing the first step of load balancing before any CDN is involved.
Each visit generates a Page View (PV) and a Unique Visitor (UV); Taobao’s daily PV reaches 1.6‑2.5 billion, far surpassing sites like 12306.cn.
Because the traffic is enormous, the homepage is generated by a farm of hundreds of servers, with request distribution handled by LVS (Linux Virtual Server), a widely used load‑balancing system.
After the HTML is delivered, the browser must load many resources (CSS, JS, images). Since browsers limit concurrent connections per domain, Taobao spreads resources across multiple domains to bypass this limit and prepare for CDN distribution.
During peak events like Double‑11, traffic can hit 871 GB/s, equivalent to 1.78 million 4 Mb residential lines, so Taobao relies on a nationwide CDN with dozens of nodes to serve static assets from the nearest location.
When sellers upload new product images, Taobao’s distributed file system TFS (Taobao File System) ensures that all CDN nodes synchronize these files quickly.
The search system first tokenizes the Chinese query using a word‑segmentation library, then analyzes shopping intent (browsing, query, comparison, or decisive) to tailor results.
Search results are produced by over a thousand search servers, and product detail snapshots are stored in Tair, Taobao’s proprietary distributed key‑value store.
All user actions generate massive logs (terabytes per day). Taobao uses TimeTunnel to transmit these logs in real time for downstream analytics.
Overall, Taobao stores petabytes of historical data, compressed at a 1:120 ratio, and processes it with a massive data‑processing cluster called “Cloud Ladder” consisting of more than 2,000 servers.
From this data, Taobao can infer detailed personal preferences as well as broad market trends, illustrating the immense scale and complexity of its backend infrastructure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
