Zero‑Data‑Loss MySQL Failover with MHA: Keep Your Data Service Running Continuously
This guide explains how to achieve uninterrupted MySQL service by using semi‑sync replication, configuring MHA for automatic failover, deploying Orchestrator for topology management, and applying VIP‑based scripts and GTID settings to guarantee zero data loss during master switches.
1. Semi‑Sync Replication vs Asynchronous
Asynchronous replication (default) writes to the binlog and returns to the client before the slave receives the events, which can lose committed transactions if the master crashes.
主库执行事务 → 写入binlog → 立即返回客户端 → 异步传送给从库Semi‑sync replication waits for at least one slave to acknowledge receipt before returning, improving consistency and reducing loss risk.
主库执行事务 → 写入binlog → 等待至少一个从库确认收到 → 返回客户端Configuration steps:
-- 主库安装插件
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
-- 从库安装插件
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 1000; -- ms
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;Monitoring status:
SHOW STATUS LIKE 'Rpl_semi_sync%';
-- Rpl_semi_sync_master_status: ON/OFF
-- Rpl_semi_sync_master_yes_tx: successful transactions
-- Rpl_semi_sync_master_no_tx: timeout fall‑backs2. MHA High‑Availability Solution
Architecture:
应用层
↓
VIP (virtual IP)
↓
Master ←→ MHA Manager (monitor & failover)
↓
Slave1
↓
Slave2Core components:
Manager node – monitors all MySQL instances, executes failover, generates reports.
Node agents – run on each MySQL server, execute commands from the manager.
Deployment steps:
Prepare SSH key‑less login and install Perl dependencies on every node.
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub user@mysql-node1
yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-DispatchDownload and install MHA manager and node (v0.58).
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
# install node on all servers
tar zxf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
# install manager on the designated manager host
tar zxf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make installConfigure /etc/mha/app1.cnf with manager workdir, log path, users, SSH user, replication user, ping interval, and server definitions.
# /etc/mha/app1.cnf
[server default]
manager_workdir=/var/log/mha/app1
manager_log=/var/log/mha/app1/manager.log
user=mha_user
password=mha_password
ssh_user=root
repl_user=repl_user
repl_password=repl_password
ping_interval=3
[server1]
hostname=192.168.1.101
port=3306
candidate_master=1
[server2]
hostname=192.168.1.102
port=3306
candidate_master=1
[server3]
hostname=192.168.1.103
port=3306
no_master=1Validate configuration and start monitoring.
masterha_check_ssh --conf=/etc/mha/app1.cnf
masterha_check_repl --conf=/etc/mha/app1.cnf
masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover
masterha_check_status --conf=/etc/mha/app1.cnf3. Orchestrator Management Tool
Key features: web UI for topology, automatic failover with intelligent master selection, topology repair, REST API for integration.
Deployment:
# Download binary
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-3.2.6-linux-amd64.tar.gz
tar zxf orchestrator-3.2.6-linux-amd64.tar.gz
cd orchestrator
# Create metadata database
CREATE DATABASE orchestrator;
GRANT ALL ON orchestrator.* TO 'orchestrator'@'%' IDENTIFIED BY 'password';Configuration ( orchestrator.conf.json) example:
{
"Debug": false,
"EnableSyslog": true,
"MySQLTopologyUser": "orchestrator",
"MySQLTopologyPassword": "password",
"MySQLOrchestratorHost": "127.0.0.1",
"MySQLOrchestratorPort": 3306,
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orchestrator",
"MySQLOrchestratorPassword": "password",
"RaftEnabled": false,
"HTTPAdvertise": "http://192.168.1.100:3000",
"ListenAddress": ":3000",
"DiscoveryPollSeconds": 5,
"FailureDetectionPeriodBlockMinutes": 10
}Start service and open the UI:
./orchestrator http
# access http://192.168.1.100:30004. Failover Strategy and Practice
4.1 Heartbeat Detection
-- Create heartbeat table
CREATE TABLE heartbeat (
id INT PRIMARY KEY,
ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Periodic update
UPDATE heartbeat SET ts = NOW() WHERE id = 1;4.2 Automatic Failover Flow
故障检测
↓
确认故障类型
↓
选择新主节点
↓
数据一致性检查
↓
提升新主节点
↓
重构复制拓扑
↓
应用连接重定向
↓
故障恢复报告4.3 VIP‑Based Failover Script
#!/bin/bash
# vip_failover.sh
VIP="192.168.1.200"
NEW_MASTER="192.168.1.102"
OLD_MASTER="192.168.1.101"
INTERFACE="eth0"
# 1. Remove VIP from old master
ssh root@$OLD_MASTER "ip addr del $VIP/24 dev $INTERFACE"
# 2. Add VIP to new master
ssh root@$NEW_MASTER "ip addr add $VIP/24 dev $INTERFACE"
# 3. Update application connections (via config center or DNS)
update_dns_record $VIP $NEW_MASTER
# 4. Rebuild replication topology
mysql -h $NEW_MASTER -e "RESET SLAVE ALL;"
for SLAVE in ${SLAVES[@]}; do
mysql -h $SLAVE -e "STOP SLAVE; CHANGE MASTER TO MASTER_HOST='$NEW_MASTER'; START SLAVE;"
done
# 5. Send alert
send_alert "Failover completed: $NEW_MASTER is new master"4.4 Data‑Consistency Guarantees
Loss‑less semi‑sync:
SET GLOBAL rpl_semi_sync_master_wait_point = AFTER_SYNC;Parallel replication (MySQL 5.7+):
STOP SLAVE;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL slave_parallel_workers = 8;
START SLAVE;GTID‑based replication:
[mysqld]
gtid_mode=ON
enforce_gtid_consistency=ON
CHANGE MASTER TO MASTER_HOST='new_master', MASTER_AUTO_POSITION=1;4.5 Testing and Validation
Planned switch test using Orchestrator:
orchestrator-client -c graceful-master-takeover -alias production-clusterUnplanned failure simulation:
# Simulate master crash
systemctl stop mysql
# Observe automatic failover
tail -f /var/log/mha/manager.logData consistency check with pt‑table‑checksum:
pt-table-checksum \
--host=old_master \
--databases=myapp \
--no-check-binlog-formatPerformance benchmark before/after failover:
sysbench oltp_read_write \
--mysql-host=$VIP \
--mysql-db=test runSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Xiao Ying
Dedicated to sharing Java backend technical experience and original tutorials, offering career transition advice and resume editing. Recognized as a rising star in CSDN's Java backend community and ranked Top 3 in the 2022 New Star Program for Java backend.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
