Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More

This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo! Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their design patterns, scaling techniques, storage solutions, and caching strategies to reveal how massive online services are built and operated.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More

Introduction

Recently I have been exploring massive data processing and search‑engine technologies, collecting architecture diagrams of large websites such as Wikipedia, Facebook, Yahoo!, YouTube, Twitter, Google App Engine, Amazon, and Youku. This article presents those diagrams and briefly explains the key ideas behind each design.

1. Wikipedia Architecture

WikiPedia architecture diagram (copy @Mark Bergsma)

Peak traffic reaches 30,000 HTTP requests per second, about 3 Gbit/s (≈375 MB) handled by roughly 350 PC servers.

GeoDNS: a 40‑line patch for BIND adds geographic filtering, directing users to the nearest server, which is crucial for Wikipedia’s globally distributed content.

Load balancing is performed with LVS (see diagram).

2. Facebook Architecture

Facebook search architecture diagram.

The article notes that many more diagrams will follow, offering a deep dive into Facebook’s infrastructure.

3. Yahoo! Mail Architecture

Yahoo! Mail architecture.

Yahoo! Mail uses Oracle RAC to store mail‑service metadata.

4. Twitter Architecture

Twitter overall architecture diagram.

Twitter consists of twitter.com, mobile apps, and third‑party applications; most traffic originates from mobile and third‑party sources.

Cache architecture diagram shows the critical role of caching in large web projects.

5. Google App Engine Architecture

GAE architecture is divided into three parts: Front‑end, Datastore, and Service Cluster.

Front‑end includes Front End, Static Files, App Server, and App Master.

Datastore is a distributed database built on BigTable and serves as the sole persistent store for GAE.

The Service Cluster provides many services for App Server, such as Memcache, image processing, user management, URL fetching, and task queues.

6. Amazon Architecture

Amazon Dynamo key‑value storage diagram.

Dynamo uses consistent hashing with node groups to achieve high availability and scalability; 99.9% of read/write operations respond within 300 ms.

Amazon cloud architecture diagram.

7. Youku Architecture

Youku built its own CMS for front‑end page rendering; modules are well separated, giving good extensibility and simplifying maintenance.

The database evolved from a single MySQL server to master‑slave replication, vertical partitioning, and finally horizontal sharding.

Simple MySQL master‑slave replication provides read/write separation.

Vertical partitioning places different business data on separate DB servers, improving load distribution.

Horizontal sharding distributes users across multiple shards based on a hash of the user ID; a mapping table determines which shard holds a given user.

Cache strategy: Youku relies on a CDN rather than in‑memory caches; Squid write overhead and Lighttpd AIO are noted as inefficiencies.

Afterword

The article reflects on the experience of gathering and analyzing hundreds of architecture diagrams, emphasizing that while most readers may still be observers, the insights gained can help anyone grow from a curious newcomer to a seasoned engineer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendarchitectureBig DataScalabilitycloud
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.