Which MySQL High‑Availability Architecture Is Right for You? A Comprehensive Guide
The article reviews common MySQL high‑availability solutions—including shared‑storage SAN, DRBD disk replication, keepalived/heartbeat, MHA, ZooKeeper‑based HA, Galera/PXC clusters, and proxy middleware—detailing their architectures, advantages, limitations, and suitability for different business and operational requirements.
High‑availability architecture is a basic requirement for Internet services; both application and database services need to achieve high availability. Although services claim 24/7 operation, occasional outages still occur, such as pages failing to load or search engines being unreachable.
Availability is often measured by downtime per year. Achieving three nines (99.9%) allows up to 8 hours of downtime annually, while five nines (99.999%) permits only about 5 minutes of interruption. Only a few companies truly reach five nines, and even major Chinese internet giants (Baidu, Alibaba, Tencent) have experienced outages.
A system typically consists of many modules—frontend, cache, database, search, message queue, etc.—each of which must be highly available to ensure overall system availability. For database services, high availability also involves data consistency, so HA solutions must consider consistency issues.
1. Shared‑Storage (SAN) Solution
SAN (Storage Area Network) enables data sharing across servers, decoupling storage from database servers. When a server fails, a standby server can mount the same filesystem and start MySQL, providing rapid recovery.
Advantages:
Avoids data loss caused by components outside storage.
Simple deployment and transparent failover for applications.
Ensures strong consistency between primary and standby data.
Limitations:
Shared storage is a single point of failure; if it fails, data may be lost.
Relatively expensive.
2. Disk‑Replication (DRBD) Solution
DRBD (Distributed Replicated Block Device) provides block‑level synchronous replication similar to SAN, but uses replicated storage instead of shared storage. The primary server’s blocks are copied over the network to a secondary server before being committed.
Advantages:
Failover is transparent to applications.
Maintains strong consistency between primary and standby.
Limitations:
Write performance is impacted because each write must be synchronized over the network.
Typically limited to two‑node synchronous setups, reducing scalability.
Standby cannot serve read traffic, leading to resource waste.
3. Primary‑Slave Replication (Single‑Write) Solutions
3.1 keepalived / heartbeat
keepalived is an HA software that monitors server health via VRRP. Multiple keepalived instances run, with one acting as Master and others as Slaves. All servers share a virtual IP (VIP); clients connect to the VIP, which points to the current Master. If the Master fails, VRRP elects a new Master, and the VIP is reassigned, providing transparent failover.
Advantages:
Easy installation and configuration.
Fast, transparent switch‑over when the Master fails.
Limitations:
Master and standby IPs must be in the same subnet.
Health checks are relatively weak; custom scripts are often needed.
MySQL’s native asynchronous replication may cause data loss; semi‑synchronous replication can mitigate this.
keepalived itself is a single point of failure.
3.2 MHA (Master High Availability)
MHA, written in Perl, provides automated MySQL failover. It consists of an MHA Manager (management node) and MHA Nodes (data nodes). When the Master crashes, MHA promotes the most up‑to‑date Slave to Master, re‑points other Slaves, and applies any missing binary logs, ensuring minimal data loss.
Advantages:
Open‑source, easy to extend for specific business needs.
During failover, it reconciles differences among Slaves, ensuring data consistency before promotion.
Supports VIP or global directory based switch‑over.
Limitations:
Cannot guarantee strong consistency if the failed Master’s binary logs are unavailable.
Supports only one‑master multi‑slave topology (minimum three servers).
Switch‑over may not be fully transparent to applications unless VIP is used.
Not suitable for large‑scale clusters; configuration is complex.
MHA Manager itself is a single point of failure.
3.3 ZooKeeper‑Based HA
ZooKeeper provides distributed coordination using consensus protocols (e.g., Paxos, Raft). HA clients on each MySQL node report heartbeats to ZooKeeper; if a node fails, ZooKeeper notifies HA services, which then perform health checks and execute failover while ensuring only one HA instance acts at a time.
Advantages:
Provides system‑wide high availability.
Strong consistency can be achieved with MySQL semi‑synchronous replication or external tools.
Excellent scalability for large clusters.
Limitations:
Introducing ZooKeeper adds considerable complexity.
4. Multi‑Write Cluster Solutions
True multi‑write architectures allow several nodes to write the same data simultaneously. In the MySQL world, two main options exist: Percona XtraDB Cluster (PXC) based on Galera and MySQL NDB Cluster.
4.1 Percona XtraDB Cluster (PXC)
PXC uses the Galera library to provide virtually synchronous replication, allowing multiple read‑write nodes, automatic node management, strict data consistency, and high availability.
Advantages:
Quasi‑synchronous replication.
Multiple read‑write nodes enable write scaling.
Automatic node management.
Strict data consistency.
High service availability.
Limitations:
Supports only InnoDB engine.
All tables must have primary keys.
Write amplification due to synchronization across nodes.
Highly dependent on network stability; unsuitable for long‑distance replication.
4.2 Middleware Proxy Solutions
Middleware adds a transparent layer between applications and databases, handling failover, load balancing, and sharding. Examples include MySQL‑proxy, Fabric, Cobar, and TDDL. The proxy can manage VIP migration or metadata updates, making failover invisible to applications while also supporting write scaling.
Advantages:
Failover is transparent to applications.
Strong extensibility; facilitates sharding and cross‑data‑center deployment.
Limitations:
Relatively new component with limited production adoption.
Does not solve strong consistency; relies on MySQL’s own mechanisms (e.g., semi‑sync) and rollback/recovery tools.
In summary, the article presented several typical MySQL high‑availability architectures, including shared‑storage, disk‑replication, primary‑slave replication (keepalived, MHA, ZooKeeper), and multi‑node cluster solutions (PXC, middleware proxy). Each scheme was evaluated for continuous availability, data consistency, and application transparency. The author suggests that MySQL replication‑based solutions are mature and mainstream, while middleware and ZooKeeper can improve scalability and availability at the cost of higher operational complexity. Choosing the right solution depends on specific business scenarios and operational capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
