Inside the Architecture of the World’s Biggest Websites: From Wikipedia to Youku

This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their load‑balancing, caching, database, and scaling strategies to reveal how they handle massive traffic and data volumes.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Inside the Architecture of the World’s Biggest Websites: From Wikipedia to Youku

Recently, the author collected and analyzed architecture diagrams of many large‑scale websites to appreciate both their visual design and the underlying engineering ideas.

Wikipedia Architecture

Wikipedia handles peak loads of 30,000 HTTP requests per second and 3 Gbps traffic with about 350 MB/s, using roughly 350 PC servers. It employs GeoDNS (a 40‑line BIND patch) to direct users to the nearest server and uses LVS for load balancing.

Facebook Architecture

The diagram shows Facebook’s search subsystem, illustrating its internal service layout.

Yahoo! Mail Architecture

Yahoo! Mail relies on Oracle RAC to store mail‑related metadata.

Twitter Architecture

Twitter’s overall design consists of the web site, mobile clients, and third‑party applications. Traffic is primarily driven by mobile and third‑party usage.

The cache layer is crucial for large web projects; the diagram shows Twitter’s caching architecture.

Google App Engine Architecture

GAE is divided into three parts: Front‑end (Front End, Static Files, App Server, App Master), Datastore (a BigTable‑based distributed database), and a service group offering Memcache, graphics, user management, URL fetching, and task queues.

Amazon Architecture

Amazon’s Dynamo is a highly available, scalable key‑value store. It uses consistent hashing with a ring of nodes (each node being a group of machines) to locate data, achieving 99.9% of reads/writes within 300 ms.

The diagram illustrates the distributed storage system.

Amazon’s cloud architecture is shown in the following diagram.

Youku Architecture

Youku built its own CMS for front‑end page rendering, achieving good modular separation and extensibility. The front‑end module call graph is illustrated below.

The partial front‑end architecture diagram follows.

Youku’s database evolved from a single MySQL instance to master‑slave replication, SSD optimization, vertical partitioning, and finally horizontal sharding.

Simple MySQL master‑slave replication enables read/write separation, improving read performance. The replication diagram is shown below.

The replication process diagram follows.

Issues such as write scalability, caching, replication lag, lock contention, and decreasing cache hit rates led to further optimizations.

MySQL vertical partitioning isolates independent business domains onto separate database servers, improving load distribution. The resulting architecture diagram is shown below.

Because some business logic still shares data (e.g., user information), horizontal sharding was explored.

MySQL horizontal sharding (sharding) distributes users across shards based on hashed IDs. The principle diagram is shown below.

To locate a user’s shard, a mapping table is consulted before querying the appropriate shard. Cross‑shard queries are minimized; when necessary, multi‑dimensional indexes, distributed search engines, or distributed databases are used.

Cache strategy – Youku relies heavily on a CDN rather than in‑memory caches like Memcached. Squid’s write() overhead and Lighttpd’s AIO file reads affect performance, but the CDN ensures users receive video from the nearest server, providing smoother playback compared to competitors.

Source: http://blog.csdn.net/tiansan/article/details/52825241

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendcachingweb architecturedatabaseslarge-scale systems
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.