Databases 14 min read

How Facebook Scales MySQL Backups: Strategies, Storage, and Validation

This article details Facebook's MySQL backup architecture, covering preparation, logical backup format, storage locations, source selection, full and incremental backup pipelines, verification mechanisms, and future directions such as RBR‑based logical incremental backups.

dbaplus Community
dbaplus Community
dbaplus Community
How Facebook Scales MySQL Backups: Strategies, Storage, and Validation

Preparation Knowledge

Facebook’s MySQL infrastructure relies heavily on Python for automation and Thrift for RPC definitions. The backup‑agent is built on the BaseController framework (open‑source as sparts) and runs on a master/slave deployment with replicas spread across five data centers.

Backup Forms

Facebook uses logical backups with mysqldump compressed by gzip because compression dramatically reduces storage costs and the resulting files are directly readable for downstream analytics. Physical backups (e.g., xtrabackup) are employed only for specific migration scenarios.

Backup Storage Locations

Backups are stored in two tiers. Warm Backup : each data center runs an independent HDFS cluster that keeps the most recent ten days of backup files (one logical shard per HDFS). Cold Backup : older backups are moved to Isilon devices for long‑term archival and audit compliance.

Backup Source Selection

Because each replica set spans multiple data centers, several MySQL instances can serve as backup sources. The backup‑agent hashes the shard name to a bucket, then chooses a source based on network traffic, broken‑slave detection, and consistency considerations.

Backup Strategies

Different data characteristics trigger different schedules. Hot‑spot tables with frequent changes receive a full backup every three days plus incremental backups in between. For volatile or unpredictable data, a full backup is taken daily. A Python‑based backup agent on each MySQL server orchestrates these tasks according to configuration from a central config service.

Full Backup Pipeline

Full backups are performed with mysqldump piped through qpress and streamed directly into HDFS. Shards are backed up in parallel (2‑3 concurrent dumps on flash‑based servers). Logical Read‑Ahead optimizations and specific mysqldump options reduce impact on the serving workload. See Yoshinori Matsunobu’s blog for performance details.

Incremental Backup Pipeline

Instead of traditional binlog‑based increments, Facebook builds incremental backups from differences between successive full logical dumps. Version 1 wrote a full dump to HDFS daily and used a Hadoop job to compute diffs, which proved too resource‑intensive. Version 2 streams full data, keeps the latest full dump in memory, and writes only the computed diff files, eliminating redundant full‑dump writes while still incurring extra comparison overhead.

Backup Verification

All backups undergo continuous validation. The system repeatedly selects the highest‑priority backup (based on age and previous failures), restores it on an idle machine, and runs download, decompression, load, verification, and binlog replay steps. Failure at any step marks the backup invalid. Cold backups are validated via checksum comparison after they have passed hot‑storage verification.

Current Issues and Future Outlook

Future work focuses on RBR‑based logical incremental backups. By exploiting the 80/20 rule—20% of active users generate 80% of changes—Facebook estimates that a 100 GB database could be reduced to roughly 4 GB of effective incremental data per day, potentially shrinking daily backup size to ~500 MB after compression, yielding multi‑million‑dollar savings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlBackupFacebookHDFSlogical backupIncremental Backup
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.