Mastering MySQL High Availability: Architectures, Tools, and Keepalive Practices
This article presents a comprehensive overview of MySQL high‑availability solutions, covering classic HA architectures, synchronization tools such as MHA, MMM, HA‑JDBC, Galera and Group Replication, and shares NetEase's practical implementations using distributed databases and keepalive‑based failover.
Common MySQL High‑Availability Architectures
MySQL HA focuses on two problems: client failover (automatic VIP or API‑based switching) and data synchronization among multiple nodes. Typical solutions include VIP‑based HA with third‑party monitoring, API‑driven HA where applications select masters, and various binlog‑based replication methods.
HA Synchronization Software
1. VIP‑based HA (HA sync software)
Uses a virtual IP (VIP) bound to the master; on failure the VIP drifts to a slave. Data sync is usually MySQL binlog replication, optionally backed by shared storage.
Simple structure, easy management
No multi‑write support, standby is read‑only
Does not guarantee full data consistency
Low intrusion, transparent to users
2. MHA (Master High Availability)
MHA provides one‑master‑multiple‑slave HA with automatic failover while preserving data consistency.
Save binlog of the failed master
Manager finds the most up‑to‑date slave
Apply missing relay logs to other slaves
Apply saved binlog on the chosen slave
Promote the slave to new master
Re‑configure remaining slaves to replicate from the new master
3. MMM (Master‑Master Replication Manager)
Provides failover based on master‑master replication; the failover target is fixed rather than the most recent slave.
4. API‑based HA
Clients (e.g., JDBC) maintain master‑slave state and can switch via API calls. This approach enables read/write splitting, sharding, and other advanced features but adds operational complexity.
Typical API‑Based Solutions
HA‑JDBC allows configuring multiple MySQL endpoints; it handles failover, read/write separation, node status notification, and load balancing.
Data‑Sync Focused HA Solutions
5. Galera Cluster
Provides synchronous multi‑master replication with wsrep API. All nodes execute each transaction atomically; if any node fails, the transaction rolls back on all nodes, ensuring strong consistency.
Limitations include higher network overhead, full data redundancy, and lack of support for certain SQL statements (e.g., LOCK, XA).
6. MySQL Group Replication
Introduced in MySQL 5.7, it offers multi‑node write capability and strong consistency via Paxos. Each node runs transactions in the same order, guaranteeing identical state across the group. It requires InnoDB tables, primary keys, GTID enabled, and ROW‑based binlog format.
NetEase’s Practical Implementations
Distributed Database HA (DDB)
NetEase’s DDB uses MySQL nodes (DBN) for storage and a stateless management server that maintains routing information in a sysdb (also MySQL). High availability of the management layer is achieved by persisting state in sysdb and deploying multiple instances. DBN nodes can use classic VIP‑based HA or rely on the management server to update routing tables when a node fails, using a custom tool called DDBSwitch.
DDBSwitch monitors DBN health, updates the DBI driver’s node list, and gradually reopens connection pools to avoid overload during failover. This architecture has been stable for years, handling services such as video cloud and messaging back‑ends with sub‑30‑second recovery times.
Keepalive‑Based MySQL HA
For single‑node MySQL deployments, NetEase combines keepalive (a high‑availability proxy) with custom scripts to achieve failover. The process includes:
keepalive on the master periodically runs a health‑check script.
If the master is unhealthy after three retries, the script stops keepalive, causing the slave to seize the VIP and become master.
The new master runs a promotion‑check script to ensure relay logs are fully applied before enabling writes.
Key features of this solution:
Consistency verification via relay‑log checks and semi‑synchronous replication.
Network‑jitter protection to avoid flapping.
No automatic re‑promotion of the original master, preventing replication lag issues.
Customizable failure criteria aligned with business needs.
Simple management with optional manual intervention.
Additional keepalive notes include ARP refresh commands (e.g., arping -I eth1 -c 5 -s VIP GATEWAY) and the use of the nopreempt option or custom scripts to control preemption behavior.
Conclusion
The article outlines several MySQL HA patterns—from classic VIP‑based setups and mature tools like MHA and MMM to modern synchronous clusters such as Galera and Group Replication—then demonstrates how NetEase applies these concepts in both distributed and single‑node environments, leveraging keepalive and custom automation to achieve reliable, low‑latency failover.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
