Databases 6 min read

Mastering Database Backup: Strategies, Retention, and Rapid Recovery

This article explains a comprehensive database backup workflow—including daily full backups with xtrabackup, real‑time binlog incremental backups, multi‑level retention policies, automated failure detection, disaster‑recovery across multiple data centers, and fast table‑level restoration—to help prevent prolonged outages.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mastering Database Backup: Strategies, Retention, and Rapid Recovery

Backup Mechanism

We perform a full backup every 24 hours combined with real‑time binlog incremental backups.

Full Backup

The full backup uses the mature xtrabackup tool. Before each backup, a strategy‑update program examines the current cluster topology and slave status to select the appropriate instance and dynamically chooses between local backup and remote streaming backup.

Local backup: After the full backup completes, the data is encrypted, compressed, and transferred to storage.

Remote streaming backup: Uses xbstream to stream the backup directly to remote storage.

Incremental Backup

Real‑time binlog backups are performed with mysqlbinlog --read-from-remote-server , streaming binlog data to storage as it is generated.

Backup Retention Policy

Full backups follow a 4‑2‑2‑1 retention scheme, keeping the most recent 4 days, 2 weeks, 2 months, and 1 year (a total of 9 copies) to maximize recovery capability.

Incremental binlog backups retain the last 60 days of binlog data, enabling point‑in‑time recovery to any moment within that window.

Backup Failure Detection

Each backup step is monitored; any failure records an error code in the database for DBA troubleshooting. Key checks include:

Confirming the backup instance role is a slave to avoid impacting production.

Verifying replication sync status.

Detecting page‑corruption errors.

Random MD5 checks of data files before and after transfer.

Daily statistics are compiled, and any database instance that experiences two consecutive backup failures is highlighted on the HULK platform for immediate handling by on‑call staff.

Disaster‑Recovery Strategy

To guard against regional incidents such as fiber cuts or power outages, backup targets are distributed across multiple IDC locations nationwide, ensuring data remains recoverable even if a single site fails.

Rapid Recovery

Backups are stored per table in compressed packages. During restoration, only the required table files and essential metadata are transferred and decompressed, dramatically reducing recovery time.

The restoration process is fully automated via the HULK platform’s command‑execution system. Users can submit a self‑service restore request for a specific point in time; the system restores the data and creates a temporary instance for the business to replace the failed production instance.

Conclusion

A robust backup strategy, thorough detection mechanisms, and reliable monitoring are essential, but they must be complemented by administrators’ awareness and regular restore drills. Future posts will dive deeper into the technical details of the implementation.

binlogdisaster recoveryData Recoverydatabase backupxtrabackupbackup retention
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.