Databases 30 min read

Deep Dive into MySQL Replication: Mechanisms, Performance, and Real‑World Optimizations

This article thoroughly examines MySQL replication, detailing binlog formats, event types, replication workflows, semi‑synchronous and parallel replication techniques, performance benchmarks, and practical implementation steps such as fake‑slave registration and connection‑pool enhancements, while illustrating each concept with concrete examples and code snippets.

Architect
Architect
Architect
Deep Dive into MySQL Replication: Mechanisms, Performance, and Real‑World Optimizations

Background

MySQL is widely used in production, and a failure of the primary instance can halt upper‑level services; therefore, high availability and data reliability are critical. MySQL provides log‑based replication to build one‑or‑multiple replicas, improving availability, scalability, and load balancing.

Replication Principles

Binlog Introduction

Replication relies on the binary log (binlog), which records every change to the MySQL server. Three binlog formats exist:

Statement : the original SQL statement is sent to the replica.

Row : each row change is recorded and applied on the replica; DDL remains statement‑based.

Mixed : MySQL chooses statement or row format per statement.

Statement mode appeared in MySQL 3.23, row format in 5.1, and mixed format in 5.1.8. Row format is now dominant because it guarantees accurate data changes despite higher resource cost.

Binlog Event Types

Typical events include:

XID_EVENT – marks the end of a transaction.

QUERY_EVENT – DDL statements, also end a transaction.

GTID_EVENT – appears when GTID_MODE is ON (MySQL > 5.6).

TABLE_MAP_EVENT – precedes ROW_EVENTs for a given table.

Other events such as ROTATE_EVENT (file split) and FORMAT_DESCRIPTION_EVENT (metadata) also exist.

Binlog Lifecycle

Binlog files are not overwritten; the server creates new files based on a configured size (e.g., 1 GB) and retains them for a period (e.g., 7 days). Consequently, only recent history can be traced unless the files are archived.

Replication Baselines

Two baseline identifiers are used:

File Position : File: binlog.000001 and Position: 381808617 indicate the exact byte offset in a specific binlog file.

GTID : a global transaction ID like e2e0a733-3478-11eb-90fe-b4055d009f6c:1-753 uniquely identifies a transaction across servers.

File Position requires precise values; any mismatch can cause data loss or duplicate execution.

Basic Replication Flow

The replica starts an I/O thread and connects to the primary.

The primary launches a binlog dump thread , sending binlog events to the replica’s I/O thread, which writes them to a Relay Log .

The replica’s SQL thread reads the Relay Log and replays the events.

This results in one thread on the primary and two on the replica.

Timing Diagram

Relay Log Significance

The Relay Log acts as a buffer separating event fetching from replay, allowing asynchronous processing and improving fault tolerance.

Semi‑Synchronous Replication

In pure asynchronous mode, a primary failure can leave replicas lagging. Starting with MySQL 5.5, semi‑sync replication makes the primary wait for an ACK from at least one replica before committing a transaction, reducing data‑loss risk with modest performance impact.

Transaction Steps (asynchronous)

InnoDB redo write (prepare).

Binlog file flush & sync.

InnoDB redo commit.

Send binlog to replica.

Semi‑sync adds an ACK wait after step 3 (after‑commit) or after step 2 (after‑sync), depending on the chosen mode.

Parallel Replication

MySQL 5.6 introduced parallel replication to mitigate the bottleneck of a single SQL thread. Two major approaches are described:

Schema‑level parallelism : different databases (schemas) are replayed concurrently if no cross‑schema dependencies exist.

Group‑commit (logic‑clock) parallelism : transactions sharing the same commit parent or logical timestamp can be applied in parallel, even within the same schema.

Performance tests on a 64‑core, 256 GB machine with MySQL 5.7.29 showed the I/O thread can push >100 MB/s, while the SQL thread limited to 21‑23 MB/s without parallelism; parallelism can raise throughput to ~13 MB/s in the authors’ experiments.

Implementation Details

To enable parallel execution, the authors modified the DTS consumer to use a connection pool:

se.conn = make([]*Connection, meta.MaxConcurrenceTransaction)

They also introduced logical‑clock based dispatching, using LastCommitted and SequenceNumber from GTID_EVENT to decide which transactions can run concurrently.

Real‑World Application at Vivo

Vivo’s production MySQL service uses a primary‑replica‑offline asynchronous cluster. To enhance HA and data reliability, two extensions are deployed:

HA component + middleware : monitors topology, performs automatic failover, and provides read/write splitting.

Log remote replication : a classic MHA‑style binlog copy that fetches missing binlog files from a failed node and replays them on the candidate primary.

Additional options include centralized BinlogServer storage and switching to semi‑sync replication, each with listed advantages and disadvantages.

Data Transfer Service (DTS) Using Binlog

DTS captures changes by acting as a fake replica. The process consists of three steps:

Register as a slave – send a Command_Register_Slave packet (byte 21) with server_id, hostname, user, password, and port.

Issue binlog dump – send a Command_Binlog_Dump packet (byte 18) containing the start position, dump mode, and server_id.

MySQL then establishes a replication connection, allowing DTS to read binlog events.

Code snippets for the two packets are provided:

data := make([]byte, 4+1+4+1+len(hostname)+1+len(b.cfg.User)+1+len(b.cfg.Password)+2+4+4)
data := make([]byte, 4+1+4+2+4+len(p.Name))

These commands enable real‑time data flow to downstream systems such as Elasticsearch or Kafka.

Conclusion

MySQL replication not only boosts database availability and reliability but also exposes the binlog as a flexible data interface for cross‑system synchronization. Future work at the storage team focuses on strengthening BinlogServer for security and downstream data pipelines.

References

MySQL Official Documentation

Database Kernel Monthly Report

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationhigh availabilitymysqlBinlogReplicationparallel replicationSemi‑Sync
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.