Fundamentals and Advanced Strategies of MySQL Master‑Slave Replication
This article explains the basic principles of MySQL master‑slave architecture, the role of read‑only replicas, loop‑replication problems, master‑slave delay measurement, various failover strategies, parallel replication mechanisms across MySQL versions and MariaDB, GTID usage, and read‑write splitting techniques, providing practical commands and diagrams for each concept.
1. Basic Principles of MySQL Master‑Slave
In state 1 the client reads and writes only to node A (master) while node B (slave) continuously receives and applies A's updates, keeping data identical. When a switch is needed, the roles are reversed (state 2) so the client accesses B and A becomes the slave.
Even though the slave is not directly accessed, it is recommended to set it to read‑only to prevent accidental writes, avoid bugs during role switches, and to use the read‑only flag for role detection.
Read‑only does not affect the super‑privilege replication thread, which still synchronises data.
The full update flow from A to B involves a long‑lived connection, a change master command on B to specify A's IP, port, credentials and binlog position, followed by start slave which launches io_thread (fetches binlog) and sql_thread (applies it).
On B, execute change master with master details.
Run start slave to start io_thread (connects to master) and sql_thread (executes binlog).
Master validates credentials and streams binlog from the requested position.
Slave writes the received binlog to a relay log.
sql_thread reads the relay log, parses commands and executes them.
With multi‑threaded replication, the original sql_thread is split into several worker threads.
2. Loop Replication Issue
In a dual‑master (双M) setup, A and B are each other's master and slave. After A writes a transaction, its binlog is sent to B; B executes it, generates a new binlog with the same server‑id, and sends it back to A, causing an infinite loop.
MySQL records the server‑id in the binlog, allowing the following solution:
Ensure the two servers have different server‑ids; identical ids forbid master‑slave configuration.
When a slave receives a binlog, it discards it if the server‑id matches its own.
The execution flow in a dual‑master setup therefore stops the loop because A ignores logs it generated itself.
3. Master‑Slave Lag
Three timestamps are defined: T1 – transaction committed on master and written to binlog; T2 – binlog received by slave; T3 – transaction applied on slave. Lag = T3‑T1.
The show slave status command shows seconds_behind_master , which is calculated from the binlog timestamps. Lag sources include slower slave hardware, high read load on the slave, or large transactions on the master.
Mitigation methods:
Deploy multiple slaves (one‑master‑many‑slaves) to spread read load.
Stream binlog to external systems (e.g., Hadoop) for analytics.
Reduce large transactions or split them.
4. Master‑Slave Switch Strategies
4.1 Reliability‑First
Check seconds_behind_master on the slave; proceed only if it is below a threshold.
Set master to read‑only.
Wait until the slave’s lag becomes zero.
Make the slave read‑write.
Redirect client traffic to the new master.
This approach may cause a brief period where both nodes are read‑only.
4.2 Availability‑First
Immediately switch traffic to the slave and make it read‑write, ignoring lag. This reduces downtime but can produce data inconsistency.
CREATE TABLE `t` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `c` int(11) unsigned DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;Example with binlog_format=mixed shows divergent rows after the switch; using binlog_format=row limits inconsistency to a single row because the row‑based binlog records full row values.
5. Parallel Replication Strategies
5.1 MySQL 5.6 (Database‑level)
Parallelism is based on database name hashing; suitable when workload is evenly distributed across databases.
5.2 MariaDB (Logical‑Clock)
Transactions are grouped by a common commit_id written to the binlog; all transactions with the same id are dispatched to multiple workers.
5.3 MySQL 5.7 (slave‑parallel‑type)
DATABASE – same as MySQL 5.6.
LOGICAL_CLOCK – similar to MariaDB but with MySQL‑specific optimisations.
Only transactions that have passed lock conflict checks (prepare state) may be executed in parallel.
5.4 MySQL 5.7.22 (WRITESET)
Three modes control parallelism:
COMMIT_ORDER – parallel if prepare and commit timestamps overlap.
WRITESET – parallel if the writesets (hash of affected rows) do not intersect.
WRITESET_SESSION – same as WRITESET but preserves original session order.
Writesets are stored in the binlog, allowing fast dispatch without scanning the whole transaction.
6. Handling Master Failure
In a one‑master‑many‑slaves topology, a failed master can be replaced by promoting a slave. The CHANGE MASTER TO command with MASTER_HOST , MASTER_PORT , MASTER_USER , MASTER_PASSWORD , MASTER_LOG_FILE , and MASTER_LOG_POS defines the new source.
Because the exact binlog position is hard to pinpoint, a typical procedure is:
Wait for the new master to finish syncing.
Run show master status on the new master to get File/Position.
Determine the failure time T on the old master.
Use mysqlbinlog to locate the binlog position at T and set it as MASTER_LOG_POS .
If a duplicate‑key error occurs after the switch, it can be ignored by setting sql_slave_skip_counter=1 or configuring slave_skip_errors .
GTID‑Based Switch
GTID (global transaction ID) format is source_id:transaction_id . Enabling GTID requires gtid_mode=ON and enforce_gtid_consistency=ON . When using GTID, the master_auto_position=1 option in CHANGE MASTER TO activates GTID‑based replication, simplifying failover because the slave automatically finds the correct position based on GTID sets.
7. MySQL Read‑Write Splitting
Typical architectures:
Client‑side load balancing – the application knows master and slaves and routes reads/writes accordingly.
Proxy‑based routing – a proxy (e.g., MySQL‑Proxy, HAProxy) decides routing based on query type.
Strategies to avoid stale reads:
Force critical reads to the master.
Introduce a short SELECT SLEEP(1) before reading from a slave, assuming lag < 1 s.
Check seconds_behind_master and wait until it reaches zero.
Compare master and relay log positions (or GTID sets) to ensure full sync.
Use semi‑synchronous replication so a transaction is considered committed only after at least one slave acknowledges receipt.
Execute SELECT MASTER_POS_WAIT(file, pos, timeout) on a slave to wait for a specific binlog position before reading.
Use SELECT WAIT_FOR_EXECUTED_GTID_SET(gtid, timeout) to wait for a GTID to be applied.
Each method balances latency, consistency, and complexity.
Conclusion
The article provides a comprehensive guide to MySQL master‑slave fundamentals, common pitfalls such as loop replication and lag, detailed failover procedures (both position‑based and GTID‑based), parallel replication options across MySQL versions and MariaDB, and practical read‑write splitting techniques to achieve high availability and performance.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.