Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More
This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo! Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their design patterns, scaling techniques, storage solutions, and caching strategies to reveal how massive online services are built and operated.
Introduction
Recently I have been exploring massive data processing and search‑engine technologies, collecting architecture diagrams of large websites such as Wikipedia, Facebook, Yahoo!, YouTube, Twitter, Google App Engine, Amazon, and Youku. This article presents those diagrams and briefly explains the key ideas behind each design.
1. Wikipedia Architecture
WikiPedia architecture diagram (copy @Mark Bergsma)
Peak traffic reaches 30,000 HTTP requests per second, about 3 Gbit/s (≈375 MB) handled by roughly 350 PC servers.
GeoDNS: a 40‑line patch for BIND adds geographic filtering, directing users to the nearest server, which is crucial for Wikipedia’s globally distributed content.
Load balancing is performed with LVS (see diagram).
2. Facebook Architecture
Facebook search architecture diagram.
The article notes that many more diagrams will follow, offering a deep dive into Facebook’s infrastructure.
3. Yahoo! Mail Architecture
Yahoo! Mail architecture.
Yahoo! Mail uses Oracle RAC to store mail‑service metadata.
4. Twitter Architecture
Twitter overall architecture diagram.
Twitter consists of twitter.com, mobile apps, and third‑party applications; most traffic originates from mobile and third‑party sources.
Cache architecture diagram shows the critical role of caching in large web projects.
5. Google App Engine Architecture
GAE architecture is divided into three parts: Front‑end, Datastore, and Service Cluster.
Front‑end includes Front End, Static Files, App Server, and App Master.
Datastore is a distributed database built on BigTable and serves as the sole persistent store for GAE.
The Service Cluster provides many services for App Server, such as Memcache, image processing, user management, URL fetching, and task queues.
6. Amazon Architecture
Amazon Dynamo key‑value storage diagram.
Dynamo uses consistent hashing with node groups to achieve high availability and scalability; 99.9% of read/write operations respond within 300 ms.
Amazon cloud architecture diagram.
7. Youku Architecture
Youku built its own CMS for front‑end page rendering; modules are well separated, giving good extensibility and simplifying maintenance.
The database evolved from a single MySQL server to master‑slave replication, vertical partitioning, and finally horizontal sharding.
Simple MySQL master‑slave replication provides read/write separation.
Vertical partitioning places different business data on separate DB servers, improving load distribution.
Horizontal sharding distributes users across multiple shards based on a hash of the user ID; a mapping table determines which shard holds a given user.
Cache strategy: Youku relies on a CDN rather than in‑memory caches; Squid write overhead and Lighttpd AIO are noted as inefficiencies.
Afterword
The article reflects on the experience of gathering and analyzing hundreds of architecture diagrams, emphasizing that while most readers may still be observers, the insights gained can help anyone grow from a curious newcomer to a seasoned engineer.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
