How MySQL Parallel Replication Evolved: From 5.5 to 5.7 and Beyond
This article examines the evolution of MySQL parallel replication across versions 5.5, 5.6, MariaDB, and 5.7, comparing their threading strategies, binlog handling, performance trade‑offs, and practical deployment considerations for reducing replication lag.
Background
MySQL replication has traditionally applied binlog events on the replica using a single thread, which limits concurrency and can become a bottleneck for high‑throughput workloads.
Basic Solution
The core idea is to replace the single sql_thread with a dispatcher that hands work to a pool of worker_thread s. The replica always has two threads related to replication: io_thread fetches the binlog from the primary and writes it to the relay log, while sql_thread reads the relay log and executes statements.
All parallel replication implementations share this model; they differ mainly in how the dispatcher groups events for workers. Grouping strategies fall into two categories: using the traditional binlog format or modifying the binlog to add extra metadata.
MySQL 5.5 Analysis
Official MySQL 5.5 does not support parallel replication. Alibaba’s internal version added it without changing the binlog format, using two dispatch strategies: table‑level and row‑level.
Table‑level dispatch : Because row‑format binlogs contain Table_map events, the dispatcher hashes on schema+table. All updates to the same table go to the same worker, preserving order for that table while allowing different tables to execute concurrently. This works well for workloads updating many tables but suffers when a hot table dominates.
Row‑level dispatch : The dispatcher hashes on schema+table+primary_key, ensuring that updates to the same row stay ordered. It provides finer granularity but incurs higher CPU and memory overhead for hash calculation, especially with large transactions.
MySQL 5.6 Analysis
MySQL 5.6 introduced built‑in parallel replication based on database (schema) level grouping. This strategy is simple: the hash key is just the schema name, so moving tables to different schemas can enable parallelism. It offers lower overhead and works with any binlog format, but its concurrency is coarser than table‑ or row‑level approaches.
MariaDB Analysis
MariaDB’s default "CONSERVATIVE" strategy uses a domain_id to identify the source of a transaction in multi‑master setups. Transactions sharing the same commit_id are grouped and can be applied in parallel, assuming no row conflicts. While this can dramatically increase replica throughput, large transactions can become "drag‑behind" bottlenecks that stall subsequent groups.
MySQL 5.7 Analysis
MySQL 5.7 adopts MariaDB’s commit‑id grouping and adds refinements. Instead of treating a transaction as a single atomic unit, it recognizes three phases: prepare, committing, and committed. Transactions in the same "prepare" phase can run concurrently with those already committing, effectively increasing parallelism.
The article illustrates the difference with a sample transaction timeline. In MariaDB’s original strategy, groups execute strictly sequentially, so the total time equals the sum of the longest transaction in each group. MySQL 5.7 allows the next group to start as soon as the first transaction of the previous group finishes, overlapping work and matching the primary’s concurrency, except when a very large transaction still dominates the schedule.
Summary
Parallel replication offers five main strategies:
Database‑level (coarse), table‑level (medium), and row‑level (fine) grouping based on traditional binlog information.
Two commit‑id based strategies (MariaDB and MySQL 5.7) that work with any binlog format and have lower overhead.
Finer granularity yields higher concurrency but adds CPU/memory cost and may be affected by large transactions. MySQL 5.7’s improvements provide the best overall concurrency, yet very large transactions can still become bottlenecks. Choosing a strategy depends on workload characteristics and is a key architectural decision for DBAs.
Q&A
Q1: At what TPS does single‑threaded replication become a bottleneck?
A1: In our tests, around 2 k rows/second (insert/update/delete) saturates a single replication thread, especially when the replica also runs dual‑write workloads.
Q2: Besides the dual‑write parameters, what other methods can improve replica sync speed?
A2: Common techniques include parallel relay‑fetch and merging adjacent transactions into a single commit to reduce overhead.
Q3: How do companies balance the many parallel replication solutions?
A3: Many organizations adopt the version that matches their latency requirements; some prefer sharding or schema splitting instead of relying solely on parallel replication.
Q4: How does MySQL compare to PostgreSQL in typical scenarios?
A4: MySQL’s large user base in China means most problems have existing solutions online, reducing operational risk compared to newer alternatives.
Q5: Why does a large transaction block multi‑source replication in MariaDB but not single‑source?
A5: Large transactions can lock resources; diagnosing requires stack traces (pstack) to see which threads are waiting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
