How Youku Scaled to 8900 M Daily Users: Front‑End, DB & Caching Architecture
This article examines Youku's massive scale by detailing its front‑end framework, server hardware, MySQL replication, vertical partitioning, sharding, and caching strategies, illustrating how the platform achieved high performance and reliability for billions of daily page views.
1. Website Basic Data Overview
According to 2010 statistics, Youku had a daily average of 89 million unique visitors (UV) and 1.7 billion page views (PV), making it the top-ranked Chinese video site on Google’s list. The hardware primarily consisted of Dell PowerEdge 1950 and 860 servers and Dell MD1000 storage arrays, with over 1,000 servers deployed nationwide by 2007.
2. Front‑End Framework
Youku built a custom CMS to handle page rendering, separating modules for high extensibility and simplifying development and maintenance. The module‑method‑params calling model is illustrated in the following diagrams.
3. Database Architecture
Youku’s database architecture evolved from a single MySQL instance to master‑slave replication, SSD optimization, vertical partitioning, and horizontal sharding.
3.1 Simple MySQL Master‑Slave Replication
Replication enabled read‑write separation and improved read performance.
The replication process is shown below:
However, this approach introduced bottlenecks such as write scalability limits, lack of write caching, replication lag, increased lock contention, and reduced cache hit rates.
3.2 MySQL Vertical Partitioning
By separating independent business domains onto different database servers, vertical partitioning improved load distribution and fault isolation. The resulting architecture is illustrated below.
Nevertheless, cross‑domain data (e.g., user information) still required a solution, leading to horizontal sharding.
3.3 MySQL Horizontal Sharding (Sharding)
Users are hashed by ID and assigned to specific shards, allowing the system to scale by adding servers. The sharding principle diagram is shown below.
To locate a user's shard, a lookup table maps user IDs to shard IDs, as depicted here.
Youku minimizes cross‑shard queries; when unavoidable, it employs multi‑dimensional indexes, distributed search engines, or distributed database queries, acknowledging the high complexity and performance cost.
4. Caching Strategy
Large systems heavily rely on caching, but Youku avoids in‑memory caches like Memcached to prevent memory copying, locking, and complications when removing content. It also found that Squid’s write() and Lighttpd’s asynchronous I/O introduced inefficiencies.
Instead, Youku leverages a robust Content Delivery Network (CDN) that directs users to the nearest video server, ensuring fast video loading compared to competitors.
Source: http://www.kuqin.com/system-analysis/20110918/264936.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
