Understanding MySQL Replication: Principles, Mechanisms, and Practical Applications
This article explains MySQL replication’s background, binlog formats, event types, positioning methods, asynchronous and semi‑synchronous workflows, parallel replication techniques, and real‑world deployment strategies such as HA components, middleware, remote binlog copying, and data‑transfer services, providing a comprehensive guide for building highly available and scalable MySQL infrastructures.
MySQL replication (master‑slave) copies data changes from one MySQL server to one or more replicas, enhancing high availability, scalability, and load balancing.
Background: Production MySQL instances are critical; failures cause service disruption and data loss. Replication provides real‑time backup, read‑write separation, and rapid failover.
Replication relies on the binary log (binlog) to transmit changes. Three binlog formats exist—Statement, Row, and Mixed—with Row being the most widely used for its accuracy and completeness.
Binlog events are categorized (e.g., XID_EVENT, QUERY_EVENT, GTID_EVENT, TABLE_MAP_EVENT, ROTATE_EVENT). The binlog lifecycle involves file rotation and expiration based on size and time.
Two positioning methods are supported: File:Position and GTID. Example usage:
File position example:
File: binlog.000001 Position: 381808617Change master to a specific file/position:
CHANGE MASTER TO MASTER_LOG_FILE='binlog.000001', MASTER_LOG_POSITION=381808617;Enable GTID‑based auto‑positioning: CHANGE MASTER TO MASTER_AUTO_POSITION=1; The basic asynchronous replication flow consists of the master’s binlog‑dump thread, the replica’s I/O thread (writes to relay log), and the replica’s SQL thread (replays relay log).
Semi‑synchronous replication adds an ACK from at least one replica before the master commits, reducing data loss while keeping performance impact modest.
Parallel replication (schema‑level, group‑commit, logical‑clock) allows multiple worker threads on the replica to apply transactions concurrently, improving throughput for high‑load workloads.
vivo’s production architecture uses a primary‑replica‑offline cluster with HA components and middleware to manage topology, failover, and read/write routing. Additional safety mechanisms include remote binlog copying, centralized BinlogServer storage, and optional half‑sync mode.
Data‑transfer services (DTS) leverage the binlog to stream changes to downstream systems such as Elasticsearch or Kafka. One method is “Fake Slave” registration; example Go code for registering a slave and issuing a binlog‑dump command is shown below.
data := make([]byte, 4+1+4+1+len(hostname)+1+len(b.cfg.User)+1+len(b.cfg.Password)+2+4+4)Binlog‑dump command example: data := make([]byte, 4+1+4+2+4+len(p.Name)) Performance tests demonstrate that parallel execution and connection‑pool improvements can raise throughput from ~7 MB/s to >13 MB/s.
In summary, MySQL replication not only boosts database availability and reliability but also exposes the binlog as a versatile interface for real‑time data integration, with future work focusing on BinlogServer‑based extensions for security and downstream connectivity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
