How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies
This article explores practical MySQL high‑availability solutions, covering asynchronous and semi‑synchronous replication, monitoring with keepalived or Zookeeper, failover decision criteria, GTID and pseudo‑GTID techniques, and lessons learned from real‑world deployments.
Replication
To avoid data loss, replication is a good solution. MySQL's built‑in asynchronous replication can cause data loss if the master crashes before slaves receive the transaction.
Semi‑synchronous replication improves safety: the master waits for at least one slave to acknowledge receipt and write to its relay log before committing. This reduces loss risk, and performance impact is minimal; MySQL 5.7+ supports loss‑less semi‑sync.
For absolute safety, Galera cluster writes to three nodes simultaneously, though it is more invasive. DRBD is another option, but not used here.
In our projects we will prefer semi‑synchronous replication, and consider Galera for critical data.
Monitor
A monitoring system must detect master failure promptly; otherwise data loss may occur.
We use keepalived or heartbeat for basic failover, but they cannot detect service unavailability instantly. Our current setup uses keepalived, while we plan to adopt Zookeeper for cluster monitoring and failover.
Each MySQL instance runs an agent on the same host that pings the server and registers itself in Zookeeper as an ephemeral node. When the agent or MySQL stops, Zookeeper receives a children‑changed event, allowing a controller to trigger failover or wait for the agent to restart.
Zookeeper simplifies cluster state monitoring and event notification, though deployment is more complex and updating application connection strings after failover requires extra handling. For simple master‑slave setups, keepalived may be sufficient.
Failover
After a failure, we must choose the slave with the most recent master data as the new master. The SHOW SLAVE STATUS command provides Master_Log_File and Read_Master_Log_Pos to compare.
GTID
MySQL 5.6+ introduces GTID (uuid:gid) to uniquely identify transactions, simplifying failover.
Example: slave b has last GTID 3E11FA47-71CA-11E1-9E33-C80AA9429562:23, slave c has 3E11FA47-71CA-11E1-9E33-C80AA9429562:15. When promoting b, c can start replicating from the event after GTID …:15.
Google’s early Global Transaction ID patch inspired similar mechanisms in other systems.
Pseudo GTID
For MySQL versions before 5.6, we need an alternative. Orchestrator uses a pseudo‑GTID: a scheduled event creates a view containing a UUID every 10 seconds, which is logged to the binlog.
create database if not exists meta; drop event if exists meta.create_pseudo_gtid_view_event; delimiter ;; create event if not exists meta.create_pseudo_gtid_view_event on schedule every 10 second starts current_timestamp on completion preserve enable do begin set @pseudo_gtid := uuid(); set @_create_statement := concat('create or replace view meta.pseudo_gtid_view as select ''', @pseudo_gtid, ''' as pseudo_gtid_unique_val from dual'); PREPARE st FROM @_create_statement; EXECUTE st; DEALLOCATE PREPARE st; end;; delimiter ; set global event_scheduler := 1;
This event writes a UUID to a view, which appears in the binlog, allowing us to locate a 10‑second window for comparison between masters and slaves.
Using pseudo‑GTID requires slaves to enable log‑slave‑updates, which is also required for GTID.
Our future failover tool will adopt this pseudo‑GTID approach.
Postscript
MySQL HA is a deep field; the author lists recent research and plans to implement more tools in go‑mysql.
Update
After further study, the author fully embraces GTID and has upgraded all MySQL instances to 5.6+. The go‑mysql replication library now supports semi‑sync, allowing real‑time binlog forwarding to a collector. MHA remains a reliable failover tool, and the author will not pursue a custom Zookeeper solution.
The proposed HA design has not been deployed yet due to the author's upcoming departure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
