How Facebook Cuts Power Use with Cold Storage: Inside Their Low‑Energy Data Center Design

This article examines Facebook's cold storage system, detailing how the company redesigned hardware and software to slash power consumption, improve reliability with Reed‑Solomon coding, mitigate bit‑rot, and balance loads while supporting massive photo archives in energy‑constrained data centers.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
How Facebook Cuts Power Use with Cold Storage: Inside Their Low‑Energy Data Center Design

Facebook’s rapid user growth—from 34 million users in 2007 to over 2 billion monthly active users—has driven the company to build dozens of data centers, consuming hundreds of megawatts of electricity. To curb operating costs and energy use, Facebook began exploring cold storage technology in 2011.

Cold storage targets infrequently accessed data such as seasonal photo uploads, which are rarely read after a few weeks, while keeping frequently accessed profile data on hot storage. Leveraging the Open Compute Project (OCP), Facebook publicly released many design details and, together with engineers Krish Bandaru and Kestutis Patiejunas, disclosed the challenges and solutions of their cold‑storage system.

Hardware Innovations

The new storage racks operate on only one‑sixth of the power of the original system while still supporting up to 1,000 PB of data. Key changes include:

Each panel powers a single drive, requiring a redesigned power distribution circuit that prevents multiple drives from starting simultaneously.

Reduction of fan count from six to four per node and power‑supply units from three to one, with the number of power supplies per rack cut from seven to five.

Consolidation of Open Rack bus lines from three to one, resulting in a 2 PB cabinet consuming only 25 % of the original power.

Elimination of UPS and generator hardware, further lowering energy overhead.

These modifications also introduced logistical challenges; a fully loaded rack holding 480 × 4 TB drives weighs about 1,100 kg, exceeding the capacity of existing transport vehicles.

Software Design for Power‑Unreliable Environments

Because the hardware omits batteries and UPS, the storage software must guarantee data integrity despite sudden power loss. Design principles include:

Ensuring absolute data persistence, as cold storage may hold the only backup copy of photos.

Accommodating non‑enterprise hardware constraints and handling unexpected power interruptions.

Scalability to support future expansions without degrading performance.

Reliability Through Erasure Coding

Facebook employs Reed‑Solomon erasure coding to protect against disk failures. For example, a 1 GB file is split into ten 100 MB chunks, which are then encoded into fourteen redundant pieces. Any ten of these fourteen pieces can reconstruct the original data, allowing the system to tolerate up to four simultaneous chunk losses. Redundancy levels are tuned based on drive failure rates, and per‑file checksums enable rapid integrity verification.

Mitigating Bit‑Rot

To address data degradation in idle storage (bit‑rot), the cold‑storage cluster performs a full‑scan of all objects roughly once a month. Detected corruptions trigger automatic reconstruction from redundant fragments, after which the repaired data is written to a fresh location.

Load Balancing and Future Directions

The storage software continuously balances load across all servers. When new nodes are added, data is swiftly migrated to maintain even utilization. Future upgrades aim to incorporate newer media such as flash and Blu‑ray, and to evolve the file system to handle frequent mapping and remapping operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

software reliabilityFacebookReed-Solomondata centerenergy efficiencyhardware designcold storage
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.