Inside Facebook: Architecture Behind News Feed and Chat
The article breaks down Facebook's multi‑layered architecture, explaining how its PHP‑based web tier, Thrift‑driven services, Memcached caching, and specialized Erlang and C++ servers work together to deliver the News Feed and real‑time chat at massive scale.
Architecture Overview
Facebook’s production stack is divided into two logical groups:
PHP front‑end (classic LAMP stack) – Handles HTTP requests, aggregates data from back‑end services, and renders HTML.
Back‑end services – Implement business logic in various languages (C++, Erlang, etc.) and are accessed via Thrift RPC. Supporting components include Memcached for fast key‑value caching and MySQL for durable storage.
Incoming traffic first passes through load balancers (hardware or software) that distribute requests across a pool of PHP web servers. Those web servers forward service calls to the back‑end layer.
Back‑end Component Characteristics
Services – Custom processes written in languages such as C++ or Erlang; optimized for speed and complexity.
Memcached – Simple in‑memory key‑value store that serves read‑heavy workloads with microsecond latency.
MySQL – Relational database providing durable storage; slower I/O is mitigated by Memcached caching.
News Feed Architecture
Write Path
When a user (e.g., Bob) posts a status update:
The PHP code on the web server writes the new content to MySQL.
Simultaneously it sends the generated activity ID to a Leaf Server via Scribe. The Leaf Server is selected based on the user’s ID to ensure even distribution.
Read Path
When another user (e.g., Alice) loads her home page:
The PHP front‑end invokes an Aggregator service via Thrift RPC.
The Aggregator queries a set of Leaf Servers to retrieve the most recent 40 activity IDs from Alice’s friends.
It merges and sorts the IDs, then returns the ordered list to the PHP layer.
The PHP code fetches each activity’s payload from Memcached; on a cache miss it falls back to MySQL, assembles the HTML fragments, and streams the result to the browser.
Chat Architecture
Chat interactions involve both the PHP web tier and a dedicated real‑time messaging layer.
Web requests (e.g., loading chat history, fetching online user lists) are processed by PHP servers, which call back‑end services via Thrift.
Real‑time message receipt bypasses PHP entirely. An Erlang‑based Channel server maintains long‑polling connections with browsers and pushes incoming messages to the appropriate client.
Message sending flows from the PHP tier to the Channel server, which then distributes the message to the recipient’s long‑polling connection.
Chat history storage is handled by a C++ chatlogger service that writes and reads persisted logs.
Presence information (online/offline status) is aggregated by a separate C++ presence service that consumes updates from the Channel servers.
All components – the PHP web tier, Channel servers, chatlogger, and presence – run as horizontally scaled clusters. Channel servers partition users by User ID; each partition is served by a highly available Channel cluster.
References
http://www.infoq.com/presentations/Scale-at-Facebook
http://www.infoq.com/presentations/Facebook-Software-Stack
http://www.infoq.com/presentations/Evolution-of-Code-Design-at-Facebook
http://www.infoq.com/presentations/Infrastructure-at-Facebook
http://www.youtube.com/watch?v=T-Xr_PJdNmQ
http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
http://www.facebook.com/note.php?note_id=14218138919
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
