Qunar Database High Availability and Backup Recovery Architecture
The article describes Qunar's 99.99% high‑availability goal for MySQL databases, outlines its HA designs (MMM, PXC, QMHA), compares hot and cold backup methods, discusses common backup challenges, and details Qunar's comprehensive backup and recovery solution including Percona XtraBackup, dedicated backup nodes, centralized management, and consistency verification.
Internet‑scale enterprises require multiple levels of database high availability (HA); Qunar targets a 99.99% service level (no more than 53 minutes of downtime per year) and achieves this through MySQL HA architectures.
Qunar's MySQL HA solutions consist of MMM, PXC, and QMHA, where PXC and QMHA are self‑developed, providing namespace services and balancing strong consistency, performance, and cross‑datacenter considerations.
Backup strategies are divided into hot (online) and cold (offline) backups. Hot backups are further split into physical (e.g., InnoDB ibbackup, Percona XtraBackup) and logical (e.g., mysqldump, mysqlpump) methods; internet companies typically prefer hot backups, and Qunar follows this practice.
The article presents a comparison table of backup/restore methods, highlighting advantages such as speed for physical hot backups and simplicity for logical hot backups, as well as disadvantages like version binding and slower restore speeds.
Common backup problems encountered by many companies include excessive local storage consumption, difficulty selecting backup nodes, lack of centralized management, insufficient backup verification, long backup windows caused by large data volumes or blocking queries, and traffic impact on production services.
Qunar addresses these issues by wrapping Percona XtraBackup with custom features, deploying a dedicated "backup" role machine in each data center to store local backups before transferring them to a remote MFS backup pool, using a dedicated backup network interface, preventing concurrent backups on the same node, and managing all tasks through the DUBAI platform, which records start/end times, strategies, retention policies, nodes, and success status.
After each backup, Qunar performs a consistency restore to ensure the backup set is usable. The overall backup workflow is illustrated with diagrams.
1. Full backups using Percona XtraBackup.
2. Binlog backups as incremental backups, with synchronized backup nodes.
3. Stream backups (stream=tar) without retaining local backup sets.
4. Remote storage of backup sets on backup‑role machines and MFS.
5. Automatic selection of standby or statistic backup nodes, independent of master‑slave switches.
6. Dedicated backup NICs that do not consume business bandwidth.
7. Immediate consistency restoration after backup completion.
8. MFS multi‑replica mode to survive physical damage.
9. Centralized backend management, scheduling, concurrency control, and status monitoring without agents or crontab.
The backup recovery platform provides visual interfaces for viewing task execution status, adding new backup tasks, inspecting cluster‑specific backup details, and performing automated recovery by specifying target clusters and machines.
In conclusion, Qunar's backup solution mitigates many typical backup and recovery challenges, though further work remains on binlog point‑in‑time recovery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
