Databases 12 min read

MySQL Replication, High Availability, and Sharding Strategies

This article explains the evolution from single‑node MySQL to master‑slave replication, various replication modes, high‑availability solutions such as MHA and MGR, and the challenges and techniques of vertical and horizontal sharding for scaling large‑scale internet applications.

Architecture Digest

Sep 9, 2022

MySQL Replication, High Availability, and Sharding Strategies

Background

Rapid business growth leads to explosive data volume, making single‑node databases insufficient for internet‑scale services. Centralized storage faces capacity, performance, availability, and maintainability limits.

Master‑Slave Replication

The master records transaction operations (except queries) to the binlog, and slaves synchronize data via the relay log.

Binlog formats: row (detailed record, large), statement (SQL statements), mixed (combination).

Asynchronous Replication

Introduced in MySQL 3.23.15 (2000); network or machine failures can cause data inconsistency.

Semi‑synchronous Replication

Added in MySQL 5.5 (2010); a slave’s ACK allows the master to commit, ensuring at least one node has the data.

Group Replication

MySQL 5.7.17 (2016) introduced InnoDB Group Replication based on the Paxos protocol, guaranteeing data consistency through majority voting.

Problems of Master‑Slave Replication

Replication lag leads to read‑after‑write inconsistency.

Routing complexity: business layer must direct SQL to appropriate nodes, increasing coupling and requiring load‑balancing.

High‑availability cannot be fully guaranteed without additional solutions.

Database High Availability

What is High Availability?

High availability means less downtime, usually measured by SLA (e.g., 99.9% uptime equals about 8.76 hours of downtime per year).

Why Implement High Availability?

Failover mechanisms, connection‑pool heartbeats, and automatic reconnection reduce RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Disaster recovery: cold standby vs. hot standby.

Master‑slave automatic failover.

Cluster resilience: service continues despite individual node failures.

Manual Failover

If the master fails, manually promote a slave to master, which may cause data inconsistency and requires human intervention.

MHA (MySQL Master High Availability)

MHA, developed by Facebook engineers, detects master failure, selects the most up‑to‑date slave, applies missing binlogs, and moves the VIP to the new master within ~30 seconds.

Advantages: automatic detection and failover, good scalability.

Disadvantages: possible split‑brain, SSH configuration required, needs at least three servers.

MGR (MySQL Group Replication)

MGR is built‑in MySQL; when the primary node fails, another node is automatically promoted using Paxos, ensuring consistency without manual intervention.

High consistency via Paxos.

High fault tolerance with majority‑based operation.

Scalable: new nodes sync automatically.

Flexible: supports single‑master and multi‑master modes.

Database Sharding (Vertical and Horizontal Partitioning)

Vertical Partitioning

Splits the database by business domain (e.g., orders, products, users) to reduce single‑node load.

Introduces distributed transactions (XA or flexible transactions) and join challenges.

Horizontal Partitioning

Divides a table into multiple identical tables (shards) based on a sharding key, solving capacity and performance issues.

Routing problem: determine which shard to query (range routing or hash routing).

Join problem: cross‑shard joins require application‑level aggregation.

Count problem: need to aggregate counts across shards.

Order‑by problem: sorting must be performed after merging results from all shards.

Sharding Solutions

Application‑level routing (e.g., using sharding‑jdbc) – tight coupling with business code.

Database middleware (e.g., MyCAT, sharding‑proxy) – abstracts routing rules from the application.

Conclusion

Moving from a single‑node database to master‑slave replication, high‑availability architectures, and finally sharding addresses performance, capacity, availability, and operational concerns, but introduces distributed transactions, complex SQL, and routing challenges; therefore, sharding should be adopted only when data volume and performance bottlenecks demand it.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Replication Distributed Transactions database scaling

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.