Achieving Seamless MySQL HA with Pacemaker and MHA: Lessons from DTCC 2016
This article details a MySQL high‑availability solution built on Pacemaker, Corosync and MHA, explains why earlier keepalived‑based designs suffered split‑brain issues, and walks through the architecture, quorum handling, resource agents, failover workflow, testing methodology, and practical lessons learned.
Background and Motivation
During the DTCC 2016 conference, Chen Huajun presented the challenges faced with a traditional MySQL master‑slave setup that relied on MHA for failover and keepalived for VIP management. Network glitches caused split‑brain scenarios, leading to data inconsistency and frequent VIP flapping.
Problems with the Original Architecture
The earlier design used MySQL asynchronous replication, MHA for failover, and keepalived for high‑availability of the MHA manager. When the network failed, the cluster could experience brain‑split, causing both master and slave to accept writes, resulting in data corruption. Keepalived offered no protection against such split‑brain conditions.
Requirements for a New HA Solution
Second‑level failover within seconds.
No data loss or duplication after a switch.
Ability to handle network partitions without brain‑split.
Support for read‑write separation and scalable read load.
Explored Approaches
Various methods were evaluated, including third‑party arbitrators, quorum‑based clusters, and fencing mechanisms. Traditional hardware fencing was deemed impractical, while proxy‑based isolation added latency and complexity. Deploying an agent on each node to self‑isolate proved feasible but introduced timing gaps and potential agent failures.
Final Architecture: Pacemaker + Corosync + MHA
The chosen solution combines three nodes running Pacemaker and Corosync for cluster management, with custom resource agents (RAs) for MySQL, LVS, and VIPs. Three nodes participate in semi‑synchronous replication, ensuring that at most one node holds the master role, eliminating double‑writes.
Two dedicated slave nodes provide read‑only services behind a virtual IP (VIP) and an LVS load balancer. Additional read‑only slaves can be added without becoming master candidates, and they use asynchronous replication to avoid forming a second semi‑sync group.
Pacemaker/Corosync Fundamentals
Pacemaker offers node, resource, and configuration management, with the ability to promote/demote resources. The CIB stores cluster configuration and state, while the CRMD reacts to CIB changes and the PE engine computes the optimal state. Resource agents for VIP, MySQL, and LVS are defined and linked via constraints.
Failover Process
Three failover triggers exist: MySQL process failure detected by its RA, node failure detected by Corosync, or manual online switch. The process updates the CIB, the PE engine selects a healthy slave, promotes it to master, and invokes MHA for log compensation. Master information is stored globally in the CIB to prevent the old master from re‑joining.
Testing and Reliability Improvements
A dedicated failover test suite (RT set) simulated network glitches, node crashes, and split‑brain scenarios. Long‑running tests uncovered low‑frequency bugs such as ARP cache staleness, semi‑sync timeout bugs in MySQL 5.5, and occasional MHA hang‑ups. Upgrading Pacemaker from 1.1.7 to 1.1.14 and switching from mysqld_safe to mysqld resolved many stability issues.
Remaining Limitations
Three‑node HA incurs resource waste for workloads without read‑load balancing.
MySQL replication lag can slow MHA log compensation.
Failover may still fail if a non‑master node crashes.
MySQL itself cannot guarantee strict master‑slave consistency.
Alternative Minimalist方案
For two‑node clusters, adding a distributed lock service as a third arbitrator can achieve near‑zero data loss. The lock must be periodically renewed; loss of the lock service triggers a safe‑failover path.
Conclusion
The Pacemaker‑based HA architecture delivers sub‑second failover, data‑loss avoidance, scalable read load balancing, automatic fault isolation, and VIP‑driven routing, while remaining manageable through scripted CLI tools. However, it introduces complexity, requires careful tuning of semi‑sync parameters, and may not suit low‑read‑load scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
