Databases 19 min read

Which MySQL High‑Availability Architecture Is Right for You? A Comprehensive Guide

The article reviews common MySQL high‑availability solutions—including shared‑storage SAN, DRBD disk replication, keepalived/heartbeat, MHA, ZooKeeper‑based HA, Galera/PXC clusters, and proxy middleware—detailing their architectures, advantages, limitations, and suitability for different business and operational requirements.

21CTO
21CTO
21CTO
Which MySQL High‑Availability Architecture Is Right for You? A Comprehensive Guide

High‑availability architecture is a basic requirement for Internet services; both application and database services need to achieve high availability. Although services claim 24/7 operation, occasional outages still occur, such as pages failing to load or search engines being unreachable.

Availability is often measured by downtime per year. Achieving three nines (99.9%) allows up to 8 hours of downtime annually, while five nines (99.999%) permits only about 5 minutes of interruption. Only a few companies truly reach five nines, and even major Chinese internet giants (Baidu, Alibaba, Tencent) have experienced outages.

A system typically consists of many modules—frontend, cache, database, search, message queue, etc.—each of which must be highly available to ensure overall system availability. For database services, high availability also involves data consistency, so HA solutions must consider consistency issues.

1. Shared‑Storage (SAN) Solution

SAN (Storage Area Network) enables data sharing across servers, decoupling storage from database servers. When a server fails, a standby server can mount the same filesystem and start MySQL, providing rapid recovery.

Advantages:

Avoids data loss caused by components outside storage.

Simple deployment and transparent failover for applications.

Ensures strong consistency between primary and standby data.

Limitations:

Shared storage is a single point of failure; if it fails, data may be lost.

Relatively expensive.

2. Disk‑Replication (DRBD) Solution

DRBD (Distributed Replicated Block Device) provides block‑level synchronous replication similar to SAN, but uses replicated storage instead of shared storage. The primary server’s blocks are copied over the network to a secondary server before being committed.

Advantages:

Failover is transparent to applications.

Maintains strong consistency between primary and standby.

Limitations:

Write performance is impacted because each write must be synchronized over the network.

Typically limited to two‑node synchronous setups, reducing scalability.

Standby cannot serve read traffic, leading to resource waste.

3. Primary‑Slave Replication (Single‑Write) Solutions

3.1 keepalived / heartbeat

keepalived is an HA software that monitors server health via VRRP. Multiple keepalived instances run, with one acting as Master and others as Slaves. All servers share a virtual IP (VIP); clients connect to the VIP, which points to the current Master. If the Master fails, VRRP elects a new Master, and the VIP is reassigned, providing transparent failover.

Advantages:

Easy installation and configuration.

Fast, transparent switch‑over when the Master fails.

Limitations:

Master and standby IPs must be in the same subnet.

Health checks are relatively weak; custom scripts are often needed.

MySQL’s native asynchronous replication may cause data loss; semi‑synchronous replication can mitigate this.

keepalived itself is a single point of failure.

3.2 MHA (Master High Availability)

MHA, written in Perl, provides automated MySQL failover. It consists of an MHA Manager (management node) and MHA Nodes (data nodes). When the Master crashes, MHA promotes the most up‑to‑date Slave to Master, re‑points other Slaves, and applies any missing binary logs, ensuring minimal data loss.

Advantages:

Open‑source, easy to extend for specific business needs.

During failover, it reconciles differences among Slaves, ensuring data consistency before promotion.

Supports VIP or global directory based switch‑over.

Limitations:

Cannot guarantee strong consistency if the failed Master’s binary logs are unavailable.

Supports only one‑master multi‑slave topology (minimum three servers).

Switch‑over may not be fully transparent to applications unless VIP is used.

Not suitable for large‑scale clusters; configuration is complex.

MHA Manager itself is a single point of failure.

3.3 ZooKeeper‑Based HA

ZooKeeper provides distributed coordination using consensus protocols (e.g., Paxos, Raft). HA clients on each MySQL node report heartbeats to ZooKeeper; if a node fails, ZooKeeper notifies HA services, which then perform health checks and execute failover while ensuring only one HA instance acts at a time.

Advantages:

Provides system‑wide high availability.

Strong consistency can be achieved with MySQL semi‑synchronous replication or external tools.

Excellent scalability for large clusters.

Limitations:

Introducing ZooKeeper adds considerable complexity.

4. Multi‑Write Cluster Solutions

True multi‑write architectures allow several nodes to write the same data simultaneously. In the MySQL world, two main options exist: Percona XtraDB Cluster (PXC) based on Galera and MySQL NDB Cluster.

4.1 Percona XtraDB Cluster (PXC)

PXC uses the Galera library to provide virtually synchronous replication, allowing multiple read‑write nodes, automatic node management, strict data consistency, and high availability.

Advantages:

Quasi‑synchronous replication.

Multiple read‑write nodes enable write scaling.

Automatic node management.

Strict data consistency.

High service availability.

Limitations:

Supports only InnoDB engine.

All tables must have primary keys.

Write amplification due to synchronization across nodes.

Highly dependent on network stability; unsuitable for long‑distance replication.

4.2 Middleware Proxy Solutions

Middleware adds a transparent layer between applications and databases, handling failover, load balancing, and sharding. Examples include MySQL‑proxy, Fabric, Cobar, and TDDL. The proxy can manage VIP migration or metadata updates, making failover invisible to applications while also supporting write scaling.

Advantages:

Failover is transparent to applications.

Strong extensibility; facilitates sharding and cross‑data‑center deployment.

Limitations:

Relatively new component with limited production adoption.

Does not solve strong consistency; relies on MySQL’s own mechanisms (e.g., semi‑sync) and rollback/recovery tools.

In summary, the article presented several typical MySQL high‑availability architectures, including shared‑storage, disk‑replication, primary‑slave replication (keepalived, MHA, ZooKeeper), and multi‑node cluster solutions (PXC, middleware proxy). Each scheme was evaluated for continuous availability, data consistency, and application transparency. The author suggests that MySQL replication‑based solutions are mature and mainstream, while middleware and ZooKeeper can improve scalability and availability at the cost of higher operational complexity. Choosing the right solution depends on specific business scenarios and operational capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitymysqlClusterDatabase ReplicationHA Architecturefailover
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.