Databases 13 min read

How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies

This article explores practical MySQL high‑availability solutions, covering asynchronous and semi‑synchronous replication, monitoring with keepalived or Zookeeper, failover decision criteria, GTID and pseudo‑GTID techniques, and lessons learned from real‑world deployments.

21CTO
21CTO
21CTO
How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies

Replication

To avoid data loss, replication is a good solution. MySQL's built‑in asynchronous replication can cause data loss if the master crashes before slaves receive the transaction.

Semi‑synchronous replication improves safety: the master waits for at least one slave to acknowledge receipt and write to its relay log before committing. This reduces loss risk, and performance impact is minimal; MySQL 5.7+ supports loss‑less semi‑sync.

For absolute safety, Galera cluster writes to three nodes simultaneously, though it is more invasive. DRBD is another option, but not used here.

In our projects we will prefer semi‑synchronous replication, and consider Galera for critical data.

Monitor

A monitoring system must detect master failure promptly; otherwise data loss may occur.

We use keepalived or heartbeat for basic failover, but they cannot detect service unavailability instantly. Our current setup uses keepalived, while we plan to adopt Zookeeper for cluster monitoring and failover.

Each MySQL instance runs an agent on the same host that pings the server and registers itself in Zookeeper as an ephemeral node. When the agent or MySQL stops, Zookeeper receives a children‑changed event, allowing a controller to trigger failover or wait for the agent to restart.

Zookeeper simplifies cluster state monitoring and event notification, though deployment is more complex and updating application connection strings after failover requires extra handling. For simple master‑slave setups, keepalived may be sufficient.

Failover

After a failure, we must choose the slave with the most recent master data as the new master. The SHOW SLAVE STATUS command provides Master_Log_File and Read_Master_Log_Pos to compare.

GTID

MySQL 5.6+ introduces GTID (uuid:gid) to uniquely identify transactions, simplifying failover.

Example: slave b has last GTID 3E11FA47-71CA-11E1-9E33-C80AA9429562:23, slave c has 3E11FA47-71CA-11E1-9E33-C80AA9429562:15. When promoting b, c can start replicating from the event after GTID …:15.

Google’s early Global Transaction ID patch inspired similar mechanisms in other systems.

Pseudo GTID

For MySQL versions before 5.6, we need an alternative. Orchestrator uses a pseudo‑GTID: a scheduled event creates a view containing a UUID every 10 seconds, which is logged to the binlog.

create database if not exists meta; drop event if exists meta.create_pseudo_gtid_view_event; delimiter ;; create event if not exists meta.create_pseudo_gtid_view_event on schedule every 10 second starts current_timestamp on completion preserve enable do begin set @pseudo_gtid := uuid(); set @_create_statement := concat('create or replace view meta.pseudo_gtid_view as select ''', @pseudo_gtid, ''' as pseudo_gtid_unique_val from dual'); PREPARE st FROM @_create_statement; EXECUTE st; DEALLOCATE PREPARE st; end;; delimiter ; set global event_scheduler := 1;

This event writes a UUID to a view, which appears in the binlog, allowing us to locate a 10‑second window for comparison between masters and slaves.

Using pseudo‑GTID requires slaves to enable log‑slave‑updates, which is also required for GTID.

Our future failover tool will adopt this pseudo‑GTID approach.

Postscript

MySQL HA is a deep field; the author lists recent research and plans to implement more tools in go‑mysql.

Update

After further study, the author fully embraces GTID and has upgraded all MySQL instances to 5.6+. The go‑mysql replication library now supports semi‑sync, allowing real‑time binlog forwarding to a collector. MHA remains a reliable failover tool, and the author will not pursue a custom Zookeeper solution.

The proposed HA design has not been deployed yet due to the author's upcoming departure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlReplicationPseudo GTIDfailoverHAGTID
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.