Inside MySQL 5.6 Parallel Replication: Code Walkthrough and Design
This article explains how MySQL 5.6 introduced parallel replication to overcome the bottleneck of a single SQL thread, detailing the underlying binlog events, configuration parameters, key data structures, worker coordination, checkpoint mechanisms, and potential limitations, all from a source‑code perspective.
MySQL 5.6 replication enables higher performance, scalability, and availability; many large sites rely on it to surpass single‑instance limits and support billions of users. This article analyzes the implementation from a code‑level perspective.
Master‑slave synchronization uses binlog replay on the replica: the I/O thread fetches the master binlog into the relay log, and the SQL thread replays events. A single SQL thread becomes a bottleneck under heavy master load, causing inevitable replica lag.
To address the n‑to‑1 lag issue, MySQL 5.6 introduced parallel replication, allowing multiple SQL threads to run concurrently.
Design details can be found in worklogs WL#4648, WL#5563, WL#5569, WL#5754, WL#5599 and earlier monthly reports.
Prerequisite Knowledge
binlog
Binlog records database changes as a sequence of events, e.g.:
Query_log
Table_map
Write/Delete/Update_row_event
XidRefer to the official documentation for the meaning of each event.
Configuration
Parallel replication can be tuned via several parameters:
slave_parallel_workers – number of worker threads
slave-checkpoint-group – how many transactions trigger a checkpoint
slave-checkpoint-period – time interval between checkpoints
slave-pending-jobs-size-max – maximum size of pending events for workers
Concept Terminology
MTS – Multi‑Threaded Slave (parallel replication)
group – a set of events belonging to one transaction in the binlog
worker – execution thread introduced by MTS
Coordinator – the former SQL thread that now distributes work
checkpoint – point where the Coordinator collects completed work and advances the execution position
B‑event – transaction start (BEGIN or GTID)
G‑event – events containing database information (Table_map, Query)
T‑event – transaction end (COMMIT/ROLLBACK or XID)
Related Source Files
sql/rpl_rli_pdb.h
sql/rpl_rli_pdb.cc
sql/rpl_slave.cc
sql/log_event.cc
sql/rpl_rli.h
Parallel Execution Principles
The model follows a producer‑consumer pattern: the Coordinator (C) inserts events into each worker’s (W) task queue, and workers pull events for execution.
All events of the same group are sent to the same worker to preserve transaction consistency.
Dispatching is based on the database information in G‑events; other events follow the last assigned worker.
Important Data Structures
db_worker_hash_entry– maps a database name to a worker; stored in the Coordinator’s hash table (APH). slave_job_item – an item in a worker’s job queue, containing a binlog event. circular_buffer_queue – a dynamic array‑based ring buffer used by several queues. Slave_job_group – tracks a transaction’s metadata (log positions, worker ID, checkpoint info, completion flag, etc.). Slave_committed_queue – a subclass of circular_buffer_queue that holds Slave_job_group objects. Slave_jobs_queue – each worker’s task queue, also a subclass of circular_buffer_queue. Slave_worker – represents a worker thread; contains its job queue, coordinator pointer, and execution state. Relay_log_info – the Coordinator’s extended structure (formerly the SQL thread) that holds mapping tables, worker array, pending‑job counters, GAQ, and checkpoint configuration.
Other Methods
map_db_to_worker()– maps a database to a worker. get_least_occupied_worker() – selects the least loaded worker. wait_for_workers_to_finish() – synchronizes workers before switching to serial execution. append_item_to_jobs() – enqueues an event into a worker’s job queue. mts_move_temp_table_to_entry() and mts_move_temp_tables_to_thd() – handle temporary table transfer.
Initialization
Compared with single‑threaded SQL, MTS initializes additional variables and starts worker threads via slave_start_workers(), which sets up the Coordinator’s hash tables, GAQ, and worker structures, then calls slave_start_single_worker() for each worker. Workers run handle_slave_worker(), repeatedly invoking slave_worker_exec_job() to process assigned events.
Coordinator Dispatch Coordination
The Coordinator repeatedly calls exec_relay_log_event(), which reads the next event ( next_event()) and applies it ( apply_event_and_update_pos()). If MTS is enabled, get_slave_worker() determines the target worker.
Event classification:
B‑event – BEGIN/GTID (transaction start)
G‑event – contains database info (Table_map, Query)
P‑event – pre‑G events (int_var, rand, user_var, etc.)
R‑event – row events following G‑event
T‑event – COMMIT/ROLLBACK or XID (transaction end)
Dispatch logic:
B‑event: increment mts_groups_assigned, enqueue a new group in GAQ, store the event in curr_group_da (no DB info yet).
G‑event: use map_db_to_worker() to find or create a mapping; if the mapping already exists and is free, reuse it; otherwise resolve conflicts or create a new entry, possibly evicting unused mappings when the hash exceeds its soft limit.
Other events: use the last assigned worker.
When to Switch to Serial Execution
If a G‑event references more than MAX_DBS_IN_EVENT_MTS (16) databases or involves tables with foreign‑key dependencies, the group is executed serially on worker 0 after all other workers finish.
Worker Execution
Workers process jobs via slave_worker_exec_job():
Dequeue an event.
Update worker‑local state (group parts, relay log positions, GAQ index).
Execute the event with do_apply_event_worker(), which ultimately calls each event’s do_apply_event().
If the event is a T‑event, call slave_worker_ends_group() to commit positions, update the corresponding Slave_job_group, and clear the worker’s group parts.
Update Coordinator statistics (pending jobs, memory usage).
Adjust overrun/underrun status.
Checkpoint Process
The Coordinator periodically runs mts_checkpoint_routine() based on time ( mts-checkpoint-period) or the number of dispatched groups ( slave-checkpoint-group). It advances the low‑water‑mark (lwm) by scanning GAQ and removing completed groups via Slave_committed_queue::move_queue_head(). The diagram below illustrates the flow:
Stopping the Slave
Executing STOP SLAVE terminates both Coordinator and workers. The Coordinator first calls slave_stop_workers(), which signals each worker to stop, waits for them to finish, performs a final checkpoint, and releases resources (hash tables, GAQ, etc.). Workers stop after completing their current group.
Abnormal Termination
If a worker encounters an error, it signals the Coordinator, clears its job queue, and sets its status to NOT_RUNNING. The Coordinator then stops remaining workers without performing a final checkpoint. If the Coordinator itself is killed, it follows a similar procedure.
Recovery
After a normal or abnormal shutdown, the slave restarts by using the Coordinator and each worker’s recorded state to restore a consistent position before resuming parallel execution.
Open Issues
MySQL 5.6 MTS dispatches at the database level, which can limit concurrency when only one database is used. A simple improvement is to dispatch by dbname + tablename. For hot‑spot tables where most events target a single table, a transaction‑level dispatch strategy could further increase parallelism, though it would require more extensive code changes.
Source: Database Kernel Monthly Report Original: http://mysql.taobao.org/monthly/2015/08/09/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
