Inside Facebook’s Massive Architecture: How the Social Giant Scales to Billions
The article details Facebook’s LAMP‑based architecture, describing how HipHop compiles PHP to C++, Thrift‑based services in PHP, C++, and Java run on custom servers, and how MySQL, Memcached, Cassandra, HBase, Hadoop, Hive, Scribe, BigPipe, Varnish, Haystack and other components together enable handling over 60,000 servers, 300 TB of cached data, 1 trillion daily clicks and petabytes of storage.
Overview
Facebook operates one of the largest dynamic sites built on the LAMP stack, augmenting it with custom components to achieve extreme scale and performance.
Frontend
The web front‑end is written in PHP. Facebook’s HipHop compiler converts PHP source code to C++ and compiles it with g++, providing high‑throughput templating and business‑logic execution.
Service Layer
Business logic is packaged as services that communicate via Thrift . Services can be implemented in PHP, C++, or Java and run on Facebook‑designed lightweight application servers, avoiding heavyweight containers such as Tomcat or Jetty.
Storage and Persistence
Relational data is stored in MySQL.
In‑memory caching is handled by Memcached, which also serves as a cache front‑end for MySQL data.
Facebook originally deployed its own Cassandra cluster but has been migrating to HBase for its simpler consistency model and native MapReduce integration.
Photo storage is managed by the custom Haystack system, an ad‑hoc, append‑only storage solution optimized for billions of images.
Offline Processing and Analytics
Batch processing relies on Hadoop and Hive. Logging, click‑stream, and feed data are collected with Scribe, aggregated into HDFS via Scribe‑HDFS, and made available for MapReduce analysis.
Performance Optimizations
BigPipe is a custom pipeline that streams page fragments asynchronously to accelerate page rendering.
Varnish Cache is used as an HTTP reverse proxy for low‑latency content delivery.
Messaging Infrastructure
Facebook Messages runs on a dynamic “Cell” architecture. Each Cell handles a subset of users; new Cells are added as traffic grows. Message persistence is stored in HBase, and a custom inverted index built on HBase powers search. The chat service runs on an Erlang‑based Epoll server accessed via Thrift.
Scale Metrics (July 2010)
More than 60,000 servers across multiple data centers, including the custom‑designed Prineville, Oregon facility (Open Compute Project).
Approximately 300 TB of data cached in Memcached.
Hadoop and Hive clusters consist of ~3,000 servers (8 CPU cores, 32 GB RAM, 12 TB disk each), totaling ~24,000 CPU cores, 96 TB RAM, and 36 PB storage.
Daily traffic: ~1 trillion clicks, 500 billion photos, 3 trillion cached objects, and 130 TB of logs.
References
HipHop for PHP: http://developers.facebook.com/blog/post/358
Thrift: http://thrift.apache.org/
Memcached: http://memcached.org/
Cassandra: http://cassandra.apache.org/
HBase: http://hbase.apache.org/
Scribe (GitHub): https://github.com/facebook/scribe
Scribe‑HDFS: http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html
BigPipe: http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919
Varnish Cache: http://www.varnish-cache.org/
Haystack storage note: http://www.facebook.com/note.php?note_id=76191543919
Messages scaling note: http://www.facebook.com/note.php?note_id=10150148835363920
Open Compute Project: http://opencompute.org/
Scaling Facebook presentation: http://www.devoxx.com
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
