How Facebook Scaled Its Data Storage with NoSQL: Cassandra, HBase, and Beyond
This article traces Facebook's evolution from a small social site to a global platform, explains how its massive data‑storage challenges led to the adoption of NoSQL solutions like Cassandra and HBase, and breaks down the core patterns, consistency models, and scaling techniques that power such large‑scale systems.
Facebook's Data Storage Evolution
From its early days as a regional social network to becoming the world’s largest platform, Facebook repeatedly overhauled its storage architecture to handle ever‑growing data volumes and scalability demands.
Key milestones include:
2008: Chat feature scaled to 70 million users using Erlang.
2010: Real‑time messaging stored >135 billion messages/month with HBase.
2011: Real‑time analytics processed 20 billion events per day using HBase.
Why Cassandra and HBase Were Created
Both systems address a common set of scalability problems:
Making the application layer stateless by separating it from the data layer.
Extending the data layer to support massive workloads.
Balancing load across machines and handling node failures.
Providing data replication, consistent synchronization, and automatic scaling.
NoSQL emerged as the overarching solution, with Cassandra and HBase as concrete implementations of the NoSQL concept.
Basic NoSQL Pattern Concepts (Core Content)
Common NoSQL products include Google Bigtable, Amazon Dynamo, and Cassandra (which combines Dynamo’s distributed design with Bigtable’s data model). Their shared characteristics are:
Key‑value storage.
Running on large numbers of inexpensive commodity servers.
Data partitioned and replicated across nodes.
Relaxed consistency requirements.
The architecture can be broken down into several layers:
A. API Model (CRUD Operations)
Standard database operations: create, read, update, delete.
B. Underlying Architecture
Hundreds to thousands of physical nodes (PN) each host several virtual nodes (VN). VN distribution enables flexible data placement.
C. Partitioning
Two main methods:
Simple hash: partition = key mod total_VN (inefficient when VN count changes).
Consistent hashing: keys are placed on a ring; each VN occupies a segment, minimizing data movement on node joins/leaves.
D. Replication
Improves reliability.
Distributes workload across replicas.
E. Node Membership Changes
When a node joins, it receives data from neighboring nodes and propagates its presence; when a node crashes, neighbors update membership and asynchronously copy missing data.
F. Client Consistency Models
Strict Consistency (one‑copy serializability).
Read‑Your‑Write.
Session Consistency.
Monotonic Read.
Eventual Consistency.
G. Master‑Slave Model (Single Master)
All requests go through a master; slaves handle reads. If the master fails, a slave is promoted.
H. Multi‑Master Model (No Master)
Updates are coordinated via two‑phase commit (2PC) or quorum‑based 2PC (Paxos), reducing coordination bottlenecks.
I. Gossip Protocol
For weaker consistency requirements, gossip spreads state information between replicas in a peer‑to‑peer fashion, similar to rumor spreading.
J. Storage Implementation
Pluggable back‑ends: MySQL, filesystem, large hash tables.
Copy‑on‑modify: each update creates a new version, propagating changes up the index hierarchy.
NoSQL Summary
To build a scalable NoSQL system you must:
Partition data across machines using consistent hashing.
Handle node failures by copying data to neighbors.
Choose a replication model (master‑slave or master‑less).
Synchronize replicas via Paxos, quorum‑based 2PC, or gossip.
Rebalance data when the cluster size changes.
Understanding these patterns makes reading Cassandra or HBase documentation much clearer and shows why NoSQL is critical for large‑scale services like Facebook.
Reference URLs: http://horicky.blogspot.com/2009/11/nosql-patterns.html, http://horicky.blogspot.com/2010/10/bigtable-model-with-cassandra-and-hbase.html, http://www.oracle.com/technetwork/cn/articles/cloudcomp/berkeleydb-nosql-323570-zhs.html, https://www.quora.com/What-is-the-difference-between-gossip-and-Paxos-protocols, http://blog.csdn.net/cloudresearch/article/details/23127985, https://www.quora.com/Why-use-Vector-Clocks-in-a-distributed-database
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
