Inside the Architecture of the World’s Biggest Websites: From Wikipedia to Youku
This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their load‑balancing, caching, database, and scaling strategies to reveal how they handle massive traffic and data volumes.
Recently, the author collected and analyzed architecture diagrams of many large‑scale websites to appreciate both their visual design and the underlying engineering ideas.
Wikipedia Architecture
Wikipedia handles peak loads of 30,000 HTTP requests per second and 3 Gbps traffic with about 350 MB/s, using roughly 350 PC servers. It employs GeoDNS (a 40‑line BIND patch) to direct users to the nearest server and uses LVS for load balancing.
Facebook Architecture
The diagram shows Facebook’s search subsystem, illustrating its internal service layout.
Yahoo! Mail Architecture
Yahoo! Mail relies on Oracle RAC to store mail‑related metadata.
Twitter Architecture
Twitter’s overall design consists of the web site, mobile clients, and third‑party applications. Traffic is primarily driven by mobile and third‑party usage.
The cache layer is crucial for large web projects; the diagram shows Twitter’s caching architecture.
Google App Engine Architecture
GAE is divided into three parts: Front‑end (Front End, Static Files, App Server, App Master), Datastore (a BigTable‑based distributed database), and a service group offering Memcache, graphics, user management, URL fetching, and task queues.
Amazon Architecture
Amazon’s Dynamo is a highly available, scalable key‑value store. It uses consistent hashing with a ring of nodes (each node being a group of machines) to locate data, achieving 99.9% of reads/writes within 300 ms.
The diagram illustrates the distributed storage system.
Amazon’s cloud architecture is shown in the following diagram.
Youku Architecture
Youku built its own CMS for front‑end page rendering, achieving good modular separation and extensibility. The front‑end module call graph is illustrated below.
The partial front‑end architecture diagram follows.
Youku’s database evolved from a single MySQL instance to master‑slave replication, SSD optimization, vertical partitioning, and finally horizontal sharding.
Simple MySQL master‑slave replication enables read/write separation, improving read performance. The replication diagram is shown below.
The replication process diagram follows.
Issues such as write scalability, caching, replication lag, lock contention, and decreasing cache hit rates led to further optimizations.
MySQL vertical partitioning isolates independent business domains onto separate database servers, improving load distribution. The resulting architecture diagram is shown below.
Because some business logic still shares data (e.g., user information), horizontal sharding was explored.
MySQL horizontal sharding (sharding) distributes users across shards based on hashed IDs. The principle diagram is shown below.
To locate a user’s shard, a mapping table is consulted before querying the appropriate shard. Cross‑shard queries are minimized; when necessary, multi‑dimensional indexes, distributed search engines, or distributed databases are used.
Cache strategy – Youku relies heavily on a CDN rather than in‑memory caches like Memcached. Squid’s write() overhead and Lighttpd’s AIO file reads affect performance, but the CDN ensures users receive video from the nearest server, providing smoother playback compared to competitors.
Source: http://blog.csdn.net/tiansan/article/details/52825241
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
