Database Scaling, Replication, High Availability, and Sharding Overview
This article explains why single‑node databases cannot keep up with rapid business growth and describes MySQL replication methods, high‑availability solutions such as MHA and MGR, and the challenges and techniques of vertical and horizontal sharding for large‑scale systems.
Background Rapid business growth leads to explosive data growth, making single‑node databases insufficient for capacity, performance, availability, and maintainability.
Master‑Slave Replication The master records transaction operations (except queries) to the binlog, and slaves synchronize data via relay logs. Binlog formats include row, statement, and mixed.
Asynchronous Replication Introduced in MySQL 3.23.15 (2000); network or machine failures can cause data inconsistency.
Semi‑Synchronous Replication Added in MySQL 5.5 (2010); a transaction is committed once at least one slave acknowledges, ensuring at least one replica has the data.
Group Replication Introduced in MySQL 5.7.17 (2016) using the Paxos algorithm to guarantee consistency across the group.
Problems with Traditional Replication Include replication lag causing read‑after‑write inconsistencies, routing complexity, and inability to guarantee high availability.
High Availability (HA) Defined as minimizing service downtime, measured by SLA (e.g., 99.9% uptime equals 8.76 hours of downtime per year). HA is achieved through failover, connection‑pool heartbeats, and reducing RTO/RPO.
Manual Switch Involves manually promoting a slave to master when the master fails, but it risks data inconsistency, requires human intervention, and adds configuration complexity.
MHA (MySQL Master High Availability) A Perl‑based framework that detects failures, selects a candidate master, synchronizes binlogs, and moves a VIP to the new master, typically completing failover within 30 seconds.
MGR (MySQL Group Replication) A built‑in plugin that automatically promotes a slave when the master fails, based on Paxos for strong consistency, offering high fault tolerance, scalability, and flexible single‑master or multi‑master modes.
Orchestrator A UI‑driven MySQL HA and replication topology management tool that supports topology adjustments, automatic failover, and manual switchover.
Sharding (Database Partitioning) Includes vertical partitioning (splitting databases by business domain) and horizontal partitioning (splitting tables into multiple identical shards). Vertical partitioning reduces single‑database size but introduces distributed transactions and join challenges. Horizontal partitioning solves capacity and performance limits but adds routing, join, count, and order‑by complexities.
Sharding Solutions Application‑level routing (e.g., sharding‑jdbc) or database middleware that abstracts routing rules from the application.
Conclusion Moving from single‑node databases to replication, HA, and sharding addresses performance, capacity, and operational issues, but introduces distributed transaction, complex SQL, and routing challenges; therefore, sharding should be adopted when data volume and performance bottlenecks justify it.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.