Databases 33 min read

Master‑Slave MySQL Replication: Core Principles, Common Pitfalls, and Advanced Strategies

This article explains MySQL master‑slave replication fundamentals, including basic synchronization, circular replication issues, lag handling, failover strategies, parallel replication techniques, GTID usage, and read‑write splitting, providing practical guidance for reliable high‑availability database deployments.

Java Interview Crash Guide

Jul 16, 2021

Master‑Slave MySQL Replication: Core Principles, Common Pitfalls, and Advanced Strategies

1. MySQL Master‑Slave Basic Principles

In state 1, clients read and write directly to node A, while node B is a replica that continuously synchronizes updates from A, keeping data identical. When a failover is needed, the system switches to state 2, where clients access node B and node A becomes the replica.

Even though node B is not directly accessed in state 1, it is recommended to set it to read‑only for several reasons:

Operational queries can be run on the replica without risking accidental writes.

It prevents failures in the failover logic.

The read‑only flag can be used to identify the role of a node.

Read‑only does not affect the super‑user replication thread, which has the necessary privileges to stay synchronized.

The following diagram shows the complete flow of an UPDATE statement executed on node A and replicated to node B:

Node B maintains a long‑running connection to node A. Node A runs a dedicated thread to serve this connection. The replication process consists of the following steps:

On node B, the CHANGE MASTER command sets the master’s IP, port, user, password, and the starting binlog file and position.

On node B, START SLAVE launches two threads: io_thread (establishes the connection) and sql_thread (applies the binlog).

Node A validates credentials and begins sending binlog entries from the requested position.

Node B writes the received binlog to a local relay log.

The sql_thread reads the relay log, parses the statements, and executes them.

With multi‑threaded replication, the original sql_thread evolves into multiple worker threads.

2. Circular Replication Issue

In a dual‑master ("double‑M") setup, node A and node B are each other's master. When node A updates a row and the resulting binlog is sent to node B, node B also generates a binlog for the same update. If both nodes are replicas of each other, this can cause the binlog to circulate indefinitely.

MySQL records the server‑id of the instance that first executed the statement in the binlog. The following logic breaks the loop:

Ensure the two servers have different server‑ids; identical IDs prevent establishing a master‑slave relationship.

A replica discards any incoming binlog whose server‑id matches its own.

The execution flow in a double‑M configuration is:

Node A writes the transaction with its server‑id.

Node B applies it and generates a binlog that still carries node A’s server‑id.

When the binlog returns to node A, the matching server‑id causes it to be ignored, breaking the cycle.

3. Master‑Slave Lag

3.1 What Is Master‑Slave Lag?

The key timestamps are:

T1 – the master finishes a transaction and writes the binlog.

T2 – the replica receives the binlog.

T3 – the replica finishes executing the transaction.

Lag is defined as T3 - T1. The SHOW SLAVE STATUS output contains Seconds_Behind_Master, which shows the current lag in seconds.

3.2 Causes of Lag

Replica hardware is slower than the master.

High read load on the replica consumes CPU, slowing replay.

Large transactions delay binlog generation and replay.

Mitigation strategies include adding more replicas, offloading analytics to external systems (e.g., Hadoop), and avoiding massive single‑statement operations.

4. Master‑Slave Switch Strategies

4.1 Reliability‑First Strategy

The detailed steps for switching from state 1 to state 2 in a double‑M setup are:

Check Seconds_Behind_Master on the replica; proceed only if it is below a threshold.

Set the master to read‑only.

Wait until the replica’s Seconds_Behind_Master becomes zero.

Set the replica to read‑write.

Redirect client traffic to the replica.

During steps 2–4 the system is unavailable for writes; step 3 (waiting for lag to reach zero) usually dominates the downtime.

4.2 Availability‑First Strategy

By moving steps 4–5 to the beginning, the switch becomes almost instantaneous, but data inconsistency may appear. Example with binlog_format=mixed shows divergent rows after the switch. Using binlog_format=row reduces the inconsistency to a duplicate‑key error, preventing both sides from applying conflicting rows.

mysql> CREATE TABLE `t` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `c` int(11) unsigned DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;

insert into t(c) values(4),(5),(6);

When the master experiences a 5‑second lag and a failover occurs after inserting c=4, the replica may apply the insert later, leading to two different rows on master and replica.

5. Parallel Replication Strategies

Parallel replication focuses on the two black arrows in the diagram: client writes to the master and the replica’s sql_thread applies the relay log. Prior to MySQL 5.6, replication was single‑threaded, causing severe lag under high write throughput.

Multi‑threaded replication splits the original sql_thread into a coordinator and multiple workers. The coordinator reads the relay log and dispatches transactions to workers based on a hashing scheme. The number of workers is controlled by slave_parallel_workers. Two requirements must be met:

Updates to the same row must be dispatched to the same worker to avoid conflicts.

A single transaction must be processed by a single worker.

5.1 MySQL 5.6 – Database‑Level Parallelism

Hashing is performed on the database name, making it effective when many databases have balanced load.

5.2 MariaDB – Write‑Set Parallelism

Transactions that belong to the same commit group share a commit_id. The commit ID is stored in the binlog, and all transactions with the same ID are dispatched to different workers, ensuring no row conflicts.

5.3 MySQL 5.7 – Configurable Parallelism

The slave_parallel_type variable selects the strategy: DATABASE – same as MySQL 5.6. LOGICAL_CLOCK – similar to MariaDB’s write‑set approach, with additional optimizations.

Only transactions that have passed lock conflict checks (i.e., are in the prepare state) may be executed in parallel.

5.4 MySQL 5.7.22 – Write‑Set Parallelism

Introduces binlog_transaction_dependency_tracking with three modes: COMMIT_ORDER – parallelism based on commit order. WRITESET – hashes each modified row (db+table+index+value); transactions with non‑overlapping write‑sets can run concurrently. WRITESET_SESSION – same as WRITESET but preserves the order of transactions from the same session.

Write‑sets are stored directly in the binlog, allowing workers to be assigned without scanning the entire transaction.

6. Handling a Failed Master

A typical one‑master‑multiple‑slaves topology is shown. When the primary fails, the new master (previously a replica) must be promoted, and all other replicas must be re‑pointed to it.

6.1 Position‑Based Failover

On the new master, run SHOW MASTER STATUS to obtain File and Position. Use

CHANGE MASTER TO MASTER_HOST=..., MASTER_LOG_FILE='File', MASTER_LOG_POS=Position

on each replica. Because the exact position is hard to capture, a best‑effort approach is used, possibly requiring manual skipping of conflicting transactions.

SET GLOBAL sql_slave_skip_counter=1; START SLAVE;

6.2 GTID‑Based Failover

GTID (global transaction ID) uniquely identifies each transaction as source_id:transaction_id. Enabling GTID mode ( gtid_mode=ON and enforce_gtid_consistency=ON) allows replicas to synchronize based on GTID sets rather than file/position.

When promoting a replica, use CHANGE MASTER TO ... master_auto_position=1 to indicate GTID‑based replication.

The master and replica exchange their GTID sets to determine which transactions still need to be applied.

7. MySQL Read‑Write Splitting

Read‑write splitting reduces load on the master by directing writes to it and reads to replicas. Two common architectures exist:

Client‑side routing, where the application knows the topology and selects the appropriate host.

Proxy‑based routing, where a middleware (e.g., ProxySQL) decides based on query type.

7.1 Strategies to Avoid Stale Reads

Force‑master reads for queries that must see the latest data.

Sleep on the replica (e.g., SELECT SLEEP(1)) assuming lag is usually under one second.

Check Seconds_Behind_Master and wait until it reaches zero before reading.

Compare master and replica positions (log file/position or GTID sets) to ensure they are identical.

Use semi‑synchronous replication so that a transaction is considered committed only after at least one replica acknowledges receipt.

In multi‑replica environments, semi‑sync guarantees freshness only on the acknowledging replica; other replicas may still serve stale data.

7.2 Waiting for a Specific Position

SELECT MASTER_POS_WAIT('binlog.000001', 12345, 5);

This function, executed on a replica, blocks until the specified binlog position has been applied (or timeout occurs).

7.3 Waiting for GTID Execution

SELECT WAIT_FOR_EXECUTED_GTID_SET('source_id:12345', 1);

Returns 0 when the GTID set has been executed, otherwise 1 on timeout.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Master‑Slave replication Read‑Write Splitting GTID parallel-replication

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. MySQL Master‑Slave Basic Principles

2. Circular Replication Issue

3. Master‑Slave Lag

3.1 What Is Master‑Slave Lag?

3.2 Causes of Lag

4. Master‑Slave Switch Strategies

4.1 Reliability‑First Strategy

4.2 Availability‑First Strategy

5. Parallel Replication Strategies

5.1 MySQL 5.6 – Database‑Level Parallelism

5.2 MariaDB – Write‑Set Parallelism

5.3 MySQL 5.7 – Configurable Parallelism

5.4 MySQL 5.7.22 – Write‑Set Parallelism

6. Handling a Failed Master

6.1 Position‑Based Failover

6.2 GTID‑Based Failover

7. MySQL Read‑Write Splitting

7.1 Strategies to Avoid Stale Reads

7.2 Waiting for a Specific Position

7.3 Waiting for GTID Execution

Java Interview Crash Guide

How this landed with the community

Was this worth your time?

0 Comments

5.1 MySQL 5.6 – Database‑Level Parallelism

5.3 MySQL 5.7 – Configurable Parallelism

5.4 MySQL 5.7.22 – Write‑Set Parallelism