Inside Youku’s Massive Architecture: Front‑End, Database & Caching Secrets
This article examines Youku’s large‑scale system architecture, detailing its server fleet, LAMP‑based front‑end framework, evolving MySQL database designs from master‑slave replication to vertical partitioning and sharding, and its caching strategies and CDN deployment that together support over 1 billion daily users.
After introducing YouTube’s technical architecture, this article looks at Youku, a leading Chinese video site, and its own architecture.
1. Basic Data
According to big‑data statistics, Youku receives over 100 million daily unique visitors (UV) and nearly 2 billion page views (PV), ranking 217th globally and 49th in China.
Initially Youku used Dell PowerEdge 1950/860 servers with Dell MD1000 storage arrays; today it operates more than 6,000 servers across major provincial nodes.
2. Front‑End Framework
From the start, Youku built its CMS on a LAMP architecture to render front‑end pages. The modules are well abstracted, offering good extensibility and UI separation, making development and maintenance simple and flexible.
Below is the front‑end module call relationship:
The routing is determined by module, method, and params, resulting in a concise design. The following diagram shows Youku’s front‑end partial architecture:
3. Database Structure
Youku’s database architecture has undergone several iterations, starting from a single MySQL server, then simple master‑slave replication, SSD optimization, vertical partitioning, and finally horizontal sharding.
1. MySQL Master‑Slave Replication
Master‑slave replication provides read‑write separation, greatly improving read performance. The process is illustrated below:
However, this mechanism introduces performance bottlenecks such as write scalability limits, lack of write caching, replication lag, increased lock contention, and larger tables reducing cache hit rates.
Write scalability issues
Write caching unavailable
Replication delay
Higher lock contention
Table growth reduces cache efficiency
2. MySQL Vertical Partitioning
When business logic is sufficiently independent, placing each business’s data on separate database servers isolates failures and balances load, significantly boosting throughput. The architecture after vertical partitioning is shown below:
3. MySQL Horizontal Sharding (Sharding)
Horizontal sharding groups users by a rule (e.g., hash of user ID) and stores each group’s data in a separate shard. As user numbers grow, adding a new server is sufficient. The principle diagram is:
To locate a user’s shard, a mapping table stores the relationship between user ID and shard ID; each request first queries this table, then accesses the appropriate shard:
Cross‑shard queries are challenging; Youku tries to avoid them, and when necessary uses multi‑dimensional shard indexes, distributed search engines, or as a last resort, distributed database queries, which are costly and impact performance.
4. Caching Strategy
Large‑scale sites love caching, from HTTP caches to memcached. Youku, however, does not use in‑memory caching for reasons such as avoiding memory copying, lock contention, and the difficulty of removing cached items when a video is taken down.
Avoid memory copy and lock overhead
Facilitate quick removal of withdrawn videos
Additionally, Squid’s write() incurs user‑process memory consumption, and Lighttpd 1.5’s asynchronous I/O (AIO) reading files into user memory reduces efficiency.
Like YouTube, Youku maintains a robust CDN network to ensure smooth playback, delivering the nearest or best‑performing video or cache server to each user based on geographic location.
5. Summary
This overview presents Youku’s core infrastructure. As technology continuously evolves, this foundation enables rapid response to new business and product demands.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
