How Facebook’s Cold Storage Cuts Power Use by 75% While Scaling Performance
Facebook’s Cold Storage system redesigns hardware and software to store rarely accessed data with up to 75% lower power consumption, modular 2U racks holding 30 disks, Reed‑Solomon erasure coding for cheap redundancy, and a self‑healing “anti‑entropy” process that improves performance as the system scales.
Background and Motivation
Facebook processes billions of photos daily, driving explosive growth in storage demand. Traditional cold‑storage solutions—tape libraries or optical media—are too slow, while commercial cloud options are costly. To handle petabytes of rarely accessed historical data, Facebook engineered a purpose‑built Cold Storage system.
Design Principles
Energy Efficiency : Storage nodes are powered on only when needed, eliminating redundant generators and batteries. Low‑cost commodity disks are used, and disk activity is throttled based on mean time between failures to reduce power draw.
Intelligent Management : The software must tolerate brief power interruptions without data loss, ensuring persistence and eliminating single points of failure. Metadata is self‑describing, removing the need for separate metadata services.
Future‑Proofing : The architecture anticipates scaling; performance should improve, not degrade, as capacity grows.
Hardware Architecture
The foundation is Open Vault Storage , a modular I/O topology built for the Open Rack standard. Each 2U chassis holds 30 disks (15 per tray) in a high‑density JBOD configuration, allowing interoperability with any server.
Two such systems are deployed in Prineville and Forest data centers, storing hundreds of petabytes with power consumption only one‑quarter of traditional solutions. Scaling up does not degrade performance; larger systems actually run faster.
Power‑Saving Techniques
Only one disk per tray is powered at a time, and firmware was modified to prevent accidental full‑rack power‑on events. Fan count per node dropped from six to four, power modules from three to one, and backplane lines from three to one, achieving an overall power draw of roughly 1/6 of conventional data‑center storage.
Cost‑Effective Data Protection
Instead of multiple full replicas, Facebook employs Reed‑Solomon erasure coding . Data is split into n blocks with m parity blocks; any n blocks can reconstruct the original data. Facebook’s current configuration uses a 10:4 ratio (10 data disks + 4 parity disks), providing tolerance for up to four simultaneous disk failures while using only 1.4 GB of extra space per 1 GB of data.
Facebook also runs a re‑encoding service that can adjust the data‑parity ratio as hardware reliability evolves.
Anti‑Entropy and Self‑Healing
To combat “bit rot,” a background “anti‑entropy” process scans all disks every 30 days, verifies checksums, and triggers a repair workflow that rebuilds corrupted data on fresh disks. This reduces reconstruction time from hours to minutes.
File‑System Bypass
Recognizing limitations of traditional file systems for massive, infrequent I/O, Facebook mounts disks as “bare disks,” bypassing the file system layer to gain full control over data flow and improve durability.
Scale‑Positive Performance
When capacity is added, the system rebalances data across new hardware without downtime, effectively acting like an intelligent disk‑defragmentation tool. Larger deployments thus see improved throughput rather than the usual degradation.
Future Directions
Although the two Cold Storage installations hold only about 1 % of Facebook’s total data, the approach will be expanded. Ongoing research includes integrating flash, Blu‑ray, and cross‑data‑center distributed storage techniques to further enhance persistence and efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
