MySQL Replication, High Availability, and Sharding Strategies
This article explains the evolution from single‑node MySQL to master‑slave replication, various replication modes, high‑availability solutions such as MHA and MGR, and the challenges and techniques of vertical and horizontal sharding for scaling large‑scale internet applications.
Background
Rapid business growth leads to explosive data volume, making single‑node databases insufficient for internet‑scale services. Centralized storage faces capacity, performance, availability, and maintainability limits.
Master‑Slave Replication
The master records transaction operations (except queries) to the binlog, and slaves synchronize data via the relay log.
Binlog formats: row (detailed record, large), statement (SQL statements), mixed (combination).
Asynchronous Replication
Introduced in MySQL 3.23.15 (2000); network or machine failures can cause data inconsistency.
Semi‑synchronous Replication
Added in MySQL 5.5 (2010); a slave’s ACK allows the master to commit, ensuring at least one node has the data.
Group Replication
MySQL 5.7.17 (2016) introduced InnoDB Group Replication based on the Paxos protocol, guaranteeing data consistency through majority voting.
Problems of Master‑Slave Replication
Replication lag leads to read‑after‑write inconsistency.
Routing complexity: business layer must direct SQL to appropriate nodes, increasing coupling and requiring load‑balancing.
High‑availability cannot be fully guaranteed without additional solutions.
Database High Availability
What is High Availability?
High availability means less downtime, usually measured by SLA (e.g., 99.9% uptime equals about 8.76 hours of downtime per year).
Why Implement High Availability?
Failover mechanisms, connection‑pool heartbeats, and automatic reconnection reduce RTO (Recovery Time Objective) and RPO (Recovery Point Objective).
Disaster recovery: cold standby vs. hot standby.
Master‑slave automatic failover.
Cluster resilience: service continues despite individual node failures.
Manual Failover
If the master fails, manually promote a slave to master, which may cause data inconsistency and requires human intervention.
MHA (MySQL Master High Availability)
MHA, developed by Facebook engineers, detects master failure, selects the most up‑to‑date slave, applies missing binlogs, and moves the VIP to the new master within ~30 seconds.
Advantages: automatic detection and failover, good scalability.
Disadvantages: possible split‑brain, SSH configuration required, needs at least three servers.
MGR (MySQL Group Replication)
MGR is built‑in MySQL; when the primary node fails, another node is automatically promoted using Paxos, ensuring consistency without manual intervention.
High consistency via Paxos.
High fault tolerance with majority‑based operation.
Scalable: new nodes sync automatically.
Flexible: supports single‑master and multi‑master modes.
Database Sharding (Vertical and Horizontal Partitioning)
Vertical Partitioning
Splits the database by business domain (e.g., orders, products, users) to reduce single‑node load.
Introduces distributed transactions (XA or flexible transactions) and join challenges.
Horizontal Partitioning
Divides a table into multiple identical tables (shards) based on a sharding key, solving capacity and performance issues.
Routing problem: determine which shard to query (range routing or hash routing).
Join problem: cross‑shard joins require application‑level aggregation.
Count problem: need to aggregate counts across shards.
Order‑by problem: sorting must be performed after merging results from all shards.
Sharding Solutions
Application‑level routing (e.g., using sharding‑jdbc) – tight coupling with business code.
Database middleware (e.g., MyCAT, sharding‑proxy) – abstracts routing rules from the application.
Conclusion
Moving from a single‑node database to master‑slave replication, high‑availability architectures, and finally sharding addresses performance, capacity, availability, and operational concerns, but introduces distributed transactions, complex SQL, and routing challenges; therefore, sharding should be adopted only when data volume and performance bottlenecks demand it.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.