Databases 19 min read

How MySQL 8.0’s WriteSet Parallel Replication Eliminates Replication Lag

This article explains the evolution of MySQL replication, from the classic master‑slave model to schema‑level and group‑commit parallelism in 5.7, and finally the WriteSet‑based parallel replication introduced in MySQL 8.0, including configuration parameters, conflict‑detection logic, and performance test results.

dbaplus Community
dbaplus Community
dbaplus Community
How MySQL 8.0’s WriteSet Parallel Replication Eliminates Replication Lag

1. MySQL Master‑Slave Replication Model

The classic MySQL replication architecture relies on the binary log (binlog). The master writes all changes to the binlog; the slave runs an I/O thread that fetches the binlog into a relay log and an SQL thread that replays the relay log sequentially. Replication delay is defined as the time between a transaction committing on the master and its execution on the slave.

Because the SQL thread replays events in a single thread, heavy write load on the master can cause the slave to fall behind.

2. Parallel Replication in MySQL 5.7

MySQL 5.6 introduced a simple schema‑level parallel replication where the SQL thread becomes a coordinator and multiple worker threads execute transactions in parallel when they belong to different schemas and are not DDL or cross‑schema operations.

MySQL 5.7 added a group‑commit‑based parallelism (controlled by slave_parallel_type). The parameter can be set to DATABASE (schema‑level) or LOGICAL_LOCK (group‑commit based). Transactions that are committed together receive the same last_committed value in the binlog, allowing them to be replayed in parallel.

Example mysqlbinlog output shows groups of transactions sharing the same last_committed value, indicating they can be applied concurrently on the slave.

root@localhost:~# mysqlbinlog mysql-bin.0000006 | grep last_committed
#150520 14:23:11 server id 88 ... GTID last_committed=0 sequence_number=1
#150520 14:23:11 server id 88 ... GTID last_committed=0 sequence_number=2
#150520 14:23:11 server id 88 ... GTID last_committed=6 sequence_number=7
#150520 14:23:11 server id 88 ... GTID last_committed=6 sequence_number=8

3. WriteSet in MySQL Group Replication (MGR)

In MGR, a certification phase uses WriteSet to detect write‑conflicts between transactions. Transactions that belong to the same WriteSet group receive the same last_committed value, enabling parallel replay.

Example commands create a database and table in a MGR cluster and insert three rows. Examining the master’s binlog and the secondary’s relay log shows identical last_committed values for the three inserts, confirming they can be replayed in parallel.

# -- create a group replication cluster.
STOP GROUP_REPLICATION; START GROUP_REPLICATION;
CREATE DATABASE test_ws_mgr;
CREATE TABLE test_ws_mgr.test (id INT PRIMARY KEY AUTO_INCREMENT, str VARCHAR(64) NOT NULL);
INSERT INTO test_ws_mgr.test(str) VALUES ('a');
INSERT INTO test_ws_mgr.test(str) VALUES ('b');
INSERT INTO test_ws_mgr.test(str) VALUES ('c');

The WriteSet mechanism introduced in 5.7 was moved to the binlog generation stage in MySQL 8.0 and is now part of the standard replication path.

4. Parallel Replication in MySQL 8.0

MySQL 8.0 adds the parameter binlog_transaction_dependency_tracking with three options:

COMMIT_ORDER – legacy group‑commit based dependency.

WRITESET – use WriteSet to decide transaction dependencies.

WRITESET_SESSION – WriteSet per session (transactions in the same session keep separate last_committed values).

The implementation stores a vector<uint64> of hash values representing primary‑key, unique‑key, or foreign‑key updates of committed transactions. The vector size is controlled by binlog_transaction_dependency_history_size (default 25000, range 1‑1000000).

The hash algorithm is chosen by transaction_write_set_extraction (OFF|XXHASH64|MURMUR32) and must be enabled when WRITESET is used.

WriteSet detection rules:

If a row has a primary key, hash (db, table, key, key_values) and add to the transaction’s write_set.

If a row has a non‑null unique key, hash the same information and add it.

If a row has a foreign key, hash the foreign‑key information and add it; mark has_related_foreign_key when the table’s primary key is referenced elsewhere.

If no key can be hashed, mark has_missing_key and fall back to COMMIT_ORDER.

During certification, if has_related_foreign_key or has_missing_key is true, the system reverts to COMMIT_ORDER. Otherwise, the write_set of the current transaction is compared with the stored write_sets; if no conflict is found, the transaction shares the previous last_committed value, otherwise it receives a new one and conflicting write_set entries are removed.

When the stored history exceeds the configured size, old entries are cleared. The feature only benefits tables that have a primary key or a non‑null unique index and requires the binlog format to be ROW.

5. Performance Results

High‑availability tests using Sysbench (1 000 000 transactions) on a Xeon E5‑2699‑V3 16‑core server show that WRITESET delivers the highest throughput when the number of client threads is low. With a single connection, WRITESET_SESSION and COMMIT_ORDER perform similarly.

Performance graphs (omitted here) illustrate the throughput differences among the three modes.

Conclusion

WriteSet‑based dependency tracking dramatically improves relay‑log replay speed on slaves, making replication lag independent of the master’s commit parallelism. In many scenarios the slave can achieve higher throughput than the master, though edge cases such as incremental backup may require preserving commit order via slave_preserve_commit_order.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancemysqlDatabase ReplicationWriteSetparallel replication
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.