How to Build a Reliable Dual‑Master MySQL HA with Keepalived and Shell Scripts
This article presents a practical dual‑master MySQL high‑availability solution using keepalived and custom shell scripts that reliably detect master health, perform graceful VIP failover, and safely switch back after repairs, while ensuring data consistency and minimal downtime.
Scenario
MySQL master high‑availability is often implemented with MHA or MMM, but those solutions add extra components. A minimal‑dependency design uses a classic dual‑master plus Keepalived architecture, where a virtual IP (VIP) points to the active master (DB1). Simple server‑process checks are insufficient; the system must verify MySQL query execution to guarantee availability and data consistency.
Improved Dual‑Master Architecture
Three shell‑script enhancements are added:
Determine master availability by executing a lightweight SQL query.
Perform VIP failover in a controlled manner.
After the failed master recovers, move the VIP back safely.
The implementation has handled >90 million queries per day in production without data‑inconsistency incidents.
Prerequisites
Set log_slave_updates = 1 in my.cnf on both servers.
Configure the standby server with read_only = 1 for all non‑root users.
Create a table test.test on both servers and insert a few rows for the health‑check query.
Health‑Check Logic
When the VIP is on DB1, Keepalived invokes /etc/keepalived/check_mysql.sh at regular intervals. The script connects with a low‑privilege MySQL user and runs a simple query such as SELECT COUNT(*) FROM test.test;. A successful result means MySQL is healthy; any error triggers the failover logic.
Failover Procedure
If the health‑check query fails, the script proceeds as follows:
Check the MySQL service status. If the service is down, disable Keepalived on DB1, causing the VIP to migrate to DB2. DB2’s Keepalived notify_master_mysql.sh script promotes its MySQL instance from read‑only to read‑write and sends a notification.
If the service appears up, wait 30 seconds and retry. If the second attempt also fails, treat the situation as a true failure and execute the same steps as in (1).
Switch‑Back Procedure
After the administrator repairs DB1, change_to_backup.sh restores the VIP to DB1:
Set read_only = 1 on DB2 to stop writes.
Kill active client threads on DB2 and restart Keepalived on DB2 so the VIP drifts back to DB1.
Verify that DB1 has caught up with DB2’s binary logs, then remove the read_only flag on DB1.
This operation should be performed during low‑traffic periods because it briefly pauses write traffic.
Data Consistency and Switch Time
Keepalived health‑check interval is configurable; the example uses 30 seconds.
If a server‑level or MySQL‑process failure occurs, the VIP switches in less than 2 seconds (Keepalived transition time).
For other failures, total switch time is roughly 30 seconds (script wait) + 2 seconds (kill‑SQL pause) + 2 seconds (Keepalived), all adjustable.
Key Scripts
/etc/keepalived/check_mysql.sh– performs the periodic query and decides whether to trigger failover. /etc/keepalived/notify_master_mysql.sh – runs on the node that acquires the VIP; it switches MySQL from read‑only to read‑write and sends alerts. /etc/keepalived/change_to_backup.sh – used by an administrator to move the VIP back to the repaired master after confirming replication catch‑up.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
