Databases 10 min read

Zero‑Data‑Loss MySQL Failover with MHA: Keep Your Data Service Running Continuously

This guide explains how to achieve uninterrupted MySQL service by using semi‑sync replication, configuring MHA for automatic failover, deploying Orchestrator for topology management, and applying VIP‑based scripts and GTID settings to guarantee zero data loss during master switches.

Senior Xiao Ying
Senior Xiao Ying
Senior Xiao Ying
Zero‑Data‑Loss MySQL Failover with MHA: Keep Your Data Service Running Continuously

1. Semi‑Sync Replication vs Asynchronous

Asynchronous replication (default) writes to the binlog and returns to the client before the slave receives the events, which can lose committed transactions if the master crashes.

主库执行事务 → 写入binlog → 立即返回客户端 → 异步传送给从库

Semi‑sync replication waits for at least one slave to acknowledge receipt before returning, improving consistency and reducing loss risk.

主库执行事务 → 写入binlog → 等待至少一个从库确认收到 → 返回客户端

Configuration steps:

-- 主库安装插件
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
-- 从库安装插件
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 1000; -- ms
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;

Monitoring status:

SHOW STATUS LIKE 'Rpl_semi_sync%';
-- Rpl_semi_sync_master_status: ON/OFF
-- Rpl_semi_sync_master_yes_tx: successful transactions
-- Rpl_semi_sync_master_no_tx: timeout fall‑backs

2. MHA High‑Availability Solution

Architecture:

应用层
↓
VIP (virtual IP)
↓
Master ←→ MHA Manager (monitor & failover)
↓
Slave1
↓
Slave2

Core components:

Manager node – monitors all MySQL instances, executes failover, generates reports.

Node agents – run on each MySQL server, execute commands from the manager.

Deployment steps:

Prepare SSH key‑less login and install Perl dependencies on every node.

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub user@mysql-node1
yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch

Download and install MHA manager and node (v0.58).

wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
# install node on all servers
tar zxf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
# install manager on the designated manager host
tar zxf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install

Configure /etc/mha/app1.cnf with manager workdir, log path, users, SSH user, replication user, ping interval, and server definitions.

# /etc/mha/app1.cnf
[server default]
manager_workdir=/var/log/mha/app1
manager_log=/var/log/mha/app1/manager.log
user=mha_user
password=mha_password
ssh_user=root
repl_user=repl_user
repl_password=repl_password
ping_interval=3

[server1]
hostname=192.168.1.101
port=3306
candidate_master=1

[server2]
hostname=192.168.1.102
port=3306
candidate_master=1

[server3]
hostname=192.168.1.103
port=3306
no_master=1

Validate configuration and start monitoring.

masterha_check_ssh --conf=/etc/mha/app1.cnf
masterha_check_repl --conf=/etc/mha/app1.cnf
masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover
masterha_check_status --conf=/etc/mha/app1.cnf

3. Orchestrator Management Tool

Key features: web UI for topology, automatic failover with intelligent master selection, topology repair, REST API for integration.

Deployment:

# Download binary
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator-3.2.6-linux-amd64.tar.gz
tar zxf orchestrator-3.2.6-linux-amd64.tar.gz
cd orchestrator
# Create metadata database
CREATE DATABASE orchestrator;
GRANT ALL ON orchestrator.* TO 'orchestrator'@'%' IDENTIFIED BY 'password';

Configuration ( orchestrator.conf.json) example:

{
  "Debug": false,
  "EnableSyslog": true,
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "password",
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "password",
  "RaftEnabled": false,
  "HTTPAdvertise": "http://192.168.1.100:3000",
  "ListenAddress": ":3000",
  "DiscoveryPollSeconds": 5,
  "FailureDetectionPeriodBlockMinutes": 10
}

Start service and open the UI:

./orchestrator http
# access http://192.168.1.100:3000

4. Failover Strategy and Practice

4.1 Heartbeat Detection

-- Create heartbeat table
CREATE TABLE heartbeat (
  id INT PRIMARY KEY,
  ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Periodic update
UPDATE heartbeat SET ts = NOW() WHERE id = 1;

4.2 Automatic Failover Flow

故障检测
↓
确认故障类型
↓
选择新主节点
↓
数据一致性检查
↓
提升新主节点
↓
重构复制拓扑
↓
应用连接重定向
↓
故障恢复报告

4.3 VIP‑Based Failover Script

#!/bin/bash
# vip_failover.sh
VIP="192.168.1.200"
NEW_MASTER="192.168.1.102"
OLD_MASTER="192.168.1.101"
INTERFACE="eth0"
# 1. Remove VIP from old master
ssh root@$OLD_MASTER "ip addr del $VIP/24 dev $INTERFACE"
# 2. Add VIP to new master
ssh root@$NEW_MASTER "ip addr add $VIP/24 dev $INTERFACE"
# 3. Update application connections (via config center or DNS)
update_dns_record $VIP $NEW_MASTER
# 4. Rebuild replication topology
mysql -h $NEW_MASTER -e "RESET SLAVE ALL;"
for SLAVE in ${SLAVES[@]}; do
  mysql -h $SLAVE -e "STOP SLAVE; CHANGE MASTER TO MASTER_HOST='$NEW_MASTER'; START SLAVE;"
done
# 5. Send alert
send_alert "Failover completed: $NEW_MASTER is new master"

4.4 Data‑Consistency Guarantees

Loss‑less semi‑sync:

SET GLOBAL rpl_semi_sync_master_wait_point = AFTER_SYNC;

Parallel replication (MySQL 5.7+):

STOP SLAVE;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL slave_parallel_workers = 8;
START SLAVE;

GTID‑based replication:

[mysqld]
gtid_mode=ON
enforce_gtid_consistency=ON

CHANGE MASTER TO MASTER_HOST='new_master', MASTER_AUTO_POSITION=1;

4.5 Testing and Validation

Planned switch test using Orchestrator:

orchestrator-client -c graceful-master-takeover -alias production-cluster

Unplanned failure simulation:

# Simulate master crash
systemctl stop mysql
# Observe automatic failover
tail -f /var/log/mha/manager.log

Data consistency check with pt‑table‑checksum:

pt-table-checksum \
  --host=old_master \
  --databases=myapp \
  --no-check-binlog-format

Performance benchmark before/after failover:

sysbench oltp_read_write \
  --mysql-host=$VIP \
  --mysql-db=test run
Topology diagram
Topology diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

High AvailabilityMySQLMHAfailoverGTIDOrchestratorSemi‑Sync Replication
Senior Xiao Ying
Written by

Senior Xiao Ying

Dedicated to sharing Java backend technical experience and original tutorials, offering career transition advice and resume editing. Recognized as a rising star in CSDN's Java backend community and ranked Top 3 in the 2022 New Star Program for Java backend.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.