Mastering MySQL: From Replication to High Availability and Sharding Strategies
This article examines why single-node databases no longer meet modern internet workloads, explores MySQL replication models (master‑slave, asynchronous, semi‑synchronous, group replication), discusses high‑availability solutions such as MHA, MGR and Orchestrator, and outlines vertical and horizontal sharding techniques along with their trade‑offs.
Background
Rapid business growth has caused data volumes to explode, making single‑node databases insufficient for internet‑scale services. Centralized storage struggles with capacity, performance, availability, and maintainability, especially when relational indexes become deep and random I/O increases.
MySQL Replication Types
Master‑Slave Replication
The master records all write operations (except reads) to the binary log (binlog).
Slaves replay the relay log to stay synchronized with the master.
Binlog Formats
ROW : Records detailed row changes; large file size.
STATEMENT : Records executed SQL statements.
MIXED : Combines ROW and STATEMENT.
Asynchronous Replication
Introduced in MySQL 3.23.15 (2000). Network or machine failures can cause data inconsistency because the master does not wait for slaves.
Semi‑Synchronous Replication
Added in MySQL 5.5 (2010). The master commits a transaction after receiving an ACK from at least one slave, ensuring at least one replica has the data.
Group Replication (InnoDB Group Replication)
Implemented in MySQL 5.7.17 (2016) using the Paxos algorithm to achieve consensus among nodes, providing strong consistency.
Issues with Traditional Master‑Slave
Replication lag leads to read‑after‑write inconsistencies.
Routing logic must direct writes to the master and handle failover, increasing application complexity.
High‑availability cannot be guaranteed without additional mechanisms.
High Availability (HA)
HA aims to minimize service downtime, typically measured by SLA percentages (e.g., 99.9% uptime equals ~8.76 hours of allowed downtime per year).
Why HA?
Failover reduces Recovery Time Objective (RTO) and Recovery Point Objective (RPO), ensuring continuous operation even when a node fails.
Disaster recovery: hot‑standby vs. cold‑standby.
Automatic master promotion when the primary fails.
Clustered slaves maintain service despite individual node failures.
Manual Switch
Operators manually promote a slave to master after a failure, but this risks data inconsistency, requires human intervention, and adds configuration overhead.
MHA (MySQL Master High Availability)
An open‑source tool from Facebook (Perl‑based) that detects failures, promotes a suitable slave within ~30 seconds, and moves a virtual IP to the new master.
Pros: automatic detection and failover, easy horizontal scaling.
Cons: possible split‑brain in extreme cases, requires SSH configuration, needs at least three servers.
MGR (MySQL Group Replication)
Built‑in MySQL plugin that automatically elects a new primary using Paxos, offering strong consistency, high fault tolerance, automatic node addition, and both single‑master and multi‑master modes.
Orchestrator
A UI‑driven MySQL topology manager that visualizes replication graphs, supports automatic failover and manual switchover via drag‑and‑drop.
Database Partitioning (Sharding)
Vertical Partitioning
Splits a monolithic database into multiple databases based on business domains (e.g., orders, products, users). Benefits include clearer responsibilities and alleviated capacity limits, but introduces distributed transaction complexity and cross‑database JOIN challenges.
Distributed transactions: XA (strong consistency, low performance) vs. flexible transactions (TCC, message‑based compensation).
JOIN problems: need application‑level aggregation.
Horizontal Partitioning (Sharding)
Divides a single table into many identical tables (shards) based on a sharding key, improving capacity and performance.
Routing: determine which shard to query.
Range routing – based on key ranges; may cause data skew.
Hash routing – key modulo shard count; uniform distribution but requires re‑sharding on scale‑out.
JOIN issues: cross‑shard joins need extra processing or a summary table.
COUNT aggregation: must scan all shards or maintain a separate counter.
ORDER BY: requires merging sorted results from all shards, which is costly.
Sharding Solutions
Application‑level routing (e.g., manual SQL logic) – high coupling, hard to maintain.
Database middleware (e.g., Sharding‑JDBC, MyCAT, Sharding‑Proxy) – abstracts routing rules, reduces code changes, but adds an extra layer.
Conclusion
Transitioning from a single‑node database to master‑slave replication, high‑availability architectures, and finally sharding addresses performance, capacity, and operational challenges, yet introduces distributed transactions, complex SQL, and routing overhead. System design should prioritize simplicity and evolve only when data volume or latency demands justify the added complexity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
