Databases 16 min read

Mastering MySQL High Availability: Architectures, Tools, and Keepalive Practices

This article presents a comprehensive overview of MySQL high‑availability solutions, covering classic HA architectures, synchronization tools such as MHA, MMM, HA‑JDBC, Galera and Group Replication, and shares NetEase's practical implementations using distributed databases and keepalive‑based failover.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering MySQL High Availability: Architectures, Tools, and Keepalive Practices

Common MySQL High‑Availability Architectures

MySQL HA focuses on two problems: client failover (automatic VIP or API‑based switching) and data synchronization among multiple nodes. Typical solutions include VIP‑based HA with third‑party monitoring, API‑driven HA where applications select masters, and various binlog‑based replication methods.

HA Synchronization Software

1. VIP‑based HA (HA sync software)

Uses a virtual IP (VIP) bound to the master; on failure the VIP drifts to a slave. Data sync is usually MySQL binlog replication, optionally backed by shared storage.

Simple structure, easy management

No multi‑write support, standby is read‑only

Does not guarantee full data consistency

Low intrusion, transparent to users

2. MHA (Master High Availability)

MHA provides one‑master‑multiple‑slave HA with automatic failover while preserving data consistency.

Save binlog of the failed master

Manager finds the most up‑to‑date slave

Apply missing relay logs to other slaves

Apply saved binlog on the chosen slave

Promote the slave to new master

Re‑configure remaining slaves to replicate from the new master

3. MMM (Master‑Master Replication Manager)

Provides failover based on master‑master replication; the failover target is fixed rather than the most recent slave.

4. API‑based HA

Clients (e.g., JDBC) maintain master‑slave state and can switch via API calls. This approach enables read/write splitting, sharding, and other advanced features but adds operational complexity.

Typical API‑Based Solutions

HA‑JDBC allows configuring multiple MySQL endpoints; it handles failover, read/write separation, node status notification, and load balancing.

Data‑Sync Focused HA Solutions

5. Galera Cluster

Provides synchronous multi‑master replication with wsrep API. All nodes execute each transaction atomically; if any node fails, the transaction rolls back on all nodes, ensuring strong consistency.

Limitations include higher network overhead, full data redundancy, and lack of support for certain SQL statements (e.g., LOCK, XA).

6. MySQL Group Replication

Introduced in MySQL 5.7, it offers multi‑node write capability and strong consistency via Paxos. Each node runs transactions in the same order, guaranteeing identical state across the group. It requires InnoDB tables, primary keys, GTID enabled, and ROW‑based binlog format.

NetEase’s Practical Implementations

Distributed Database HA (DDB)

NetEase’s DDB uses MySQL nodes (DBN) for storage and a stateless management server that maintains routing information in a sysdb (also MySQL). High availability of the management layer is achieved by persisting state in sysdb and deploying multiple instances. DBN nodes can use classic VIP‑based HA or rely on the management server to update routing tables when a node fails, using a custom tool called DDBSwitch.

DDBSwitch monitors DBN health, updates the DBI driver’s node list, and gradually reopens connection pools to avoid overload during failover. This architecture has been stable for years, handling services such as video cloud and messaging back‑ends with sub‑30‑second recovery times.

Keepalive‑Based MySQL HA

For single‑node MySQL deployments, NetEase combines keepalive (a high‑availability proxy) with custom scripts to achieve failover. The process includes:

keepalive on the master periodically runs a health‑check script.

If the master is unhealthy after three retries, the script stops keepalive, causing the slave to seize the VIP and become master.

The new master runs a promotion‑check script to ensure relay logs are fully applied before enabling writes.

Key features of this solution:

Consistency verification via relay‑log checks and semi‑synchronous replication.

Network‑jitter protection to avoid flapping.

No automatic re‑promotion of the original master, preventing replication lag issues.

Customizable failure criteria aligned with business needs.

Simple management with optional manual intervention.

Additional keepalive notes include ARP refresh commands (e.g., arping -I eth1 -c 5 -s VIP GATEWAY) and the use of the nopreempt option or custom scripts to control preemption behavior.

Conclusion

The article outlines several MySQL HA patterns—from classic VIP‑based setups and mature tools like MHA and MMM to modern synchronous clusters such as Galera and Group Replication—then demonstrates how NetEase applies these concepts in both distributed and single‑node environments, leveraging keepalive and custom automation to achieve reliable, low‑latency failover.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitydistributed databasemysqlGaleraMHAHAGroup ReplicationKeepalive
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.