Inside Facebook’s Massive Architecture: How the Social Giant Scales to Billions

The article details Facebook’s LAMP‑based architecture, describing how HipHop compiles PHP to C++, Thrift‑based services in PHP, C++, and Java run on custom servers, and how MySQL, Memcached, Cassandra, HBase, Hadoop, Hive, Scribe, BigPipe, Varnish, Haystack and other components together enable handling over 60,000 servers, 300 TB of cached data, 1 trillion daily clicks and petabytes of storage.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Inside Facebook’s Massive Architecture: How the Social Giant Scales to Billions

Overview

Facebook operates one of the largest dynamic sites built on the LAMP stack, augmenting it with custom components to achieve extreme scale and performance.

Frontend

The web front‑end is written in PHP. Facebook’s HipHop compiler converts PHP source code to C++ and compiles it with g++, providing high‑throughput templating and business‑logic execution.

Service Layer

Business logic is packaged as services that communicate via Thrift . Services can be implemented in PHP, C++, or Java and run on Facebook‑designed lightweight application servers, avoiding heavyweight containers such as Tomcat or Jetty.

Storage and Persistence

Relational data is stored in MySQL.

In‑memory caching is handled by Memcached, which also serves as a cache front‑end for MySQL data.

Facebook originally deployed its own Cassandra cluster but has been migrating to HBase for its simpler consistency model and native MapReduce integration.

Photo storage is managed by the custom Haystack system, an ad‑hoc, append‑only storage solution optimized for billions of images.

Offline Processing and Analytics

Batch processing relies on Hadoop and Hive. Logging, click‑stream, and feed data are collected with Scribe, aggregated into HDFS via Scribe‑HDFS, and made available for MapReduce analysis.

Performance Optimizations

BigPipe is a custom pipeline that streams page fragments asynchronously to accelerate page rendering.

Varnish Cache is used as an HTTP reverse proxy for low‑latency content delivery.

Messaging Infrastructure

Facebook Messages runs on a dynamic “Cell” architecture. Each Cell handles a subset of users; new Cells are added as traffic grows. Message persistence is stored in HBase, and a custom inverted index built on HBase powers search. The chat service runs on an Erlang‑based Epoll server accessed via Thrift.

Scale Metrics (July 2010)

More than 60,000 servers across multiple data centers, including the custom‑designed Prineville, Oregon facility (Open Compute Project).

Approximately 300 TB of data cached in Memcached.

Hadoop and Hive clusters consist of ~3,000 servers (8 CPU cores, 32 GB RAM, 12 TB disk each), totaling ~24,000 CPU cores, 96 TB RAM, and 36 PB storage.

Daily traffic: ~1 trillion clicks, 500 billion photos, 3 trillion cached objects, and 130 TB of logs.

References

HipHop for PHP: http://developers.facebook.com/blog/post/358

Thrift: http://thrift.apache.org/

Memcached: http://memcached.org/

Cassandra: http://cassandra.apache.org/

HBase: http://hbase.apache.org/

Scribe (GitHub): https://github.com/facebook/scribe

Scribe‑HDFS: http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html

BigPipe: http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919

Varnish Cache: http://www.varnish-cache.org/

Haystack storage note: http://www.facebook.com/note.php?note_id=76191543919

Messages scaling note: http://www.facebook.com/note.php?note_id=10150148835363920

Open Compute Project: http://opencompute.org/

Scaling Facebook presentation: http://www.devoxx.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendarchitectureBig DataScalabilityFacebook
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.