Understanding MySQL 5.6 Parallel Replication (MTS) Architecture and Implementation
This article explains the design, configuration parameters, core data structures, initialization, coordinator distribution, worker execution, checkpointing, and shutdown procedures of MySQL 5.6's Multi‑Threaded Slave (MTS) parallel replication, providing a code‑level walkthrough for developers and DBAs.
Background – MySQL replication traditionally uses a single SQL thread on the slave, which can become a bottleneck under heavy write load. MySQL 5.6 introduced Multi‑Threaded Slave (MTS) to allow parallel execution of binlog events, reducing replication lag.
Prerequisite Knowledge
Binlog events are recorded as a sequence such as Query_log Table_map Write/Delete/Update_row_event Xid . Several configuration variables control MTS behavior, e.g., slave_parallel_workers , slave-checkpoint-group , slave-checkpoint-period , and slave-pending-jobs-size-max .
Key terminology includes:
MTS – Multi‑Threaded Slave.
Group – a transaction’s series of events in the binlog.
Worker – a thread that executes events.
Coordinator – the original SQL thread that now distributes events.
Checkpoint – a synchronization point that advances the replication position.
Important Data Structures
db_worker_hash_entry – maps a database name to a worker.
slave_job_item – represents a single binlog event in a worker’s queue.
circular_buffer_queue – generic ring‑buffer used for several queues.
Slave_job_group – tracks the state of a transaction (group) across workers.
Slave_committed_queue – holds committed groups for checkpoint processing.
Slave_jobs_queue – per‑worker job queue.
Slave_worker – the worker thread object.
Relay_log_info – the coordinator (C) thread, extended with MTS‑specific fields such as mapping_db_to_worker , workers , gaq , and checkpoint counters.
Initialization – The function slave_start_workers() creates the coordinator’s MTS structures, builds the mapping_db_to_worker hash via init_hash_workers() , and launches each worker with slave_start_single_worker() . Workers run handle_slave_worker() , which repeatedly calls slave_worker_exec_job() .
Coordinator Distribution Logic – The coordinator repeatedly invokes exec_relay_log_event() , which calls next_event() to read from the relay log and apply_event_and_update_pos() to dispatch. Events are classified as B‑event (BEGIN), G‑event (contains DB info), P‑event, R‑event, and T‑event (COMMIT/ROLLBACK). B‑events start a new group; G‑events trigger map_db_to_worker() to select a worker based on the DB name, using the APH hash and the current group’s assigned parts. If the DB‑to‑worker mapping conflicts, the coordinator waits for the previous group to finish before reassigning.
When Serial Execution Is Required – If a G‑event references more than MAX_DBS_IN_EVENT_MTS (16) databases or involves foreign‑key‑dependent tables, the group is forced to run serially on worker 0.
Worker Execution – Each worker extracts a slave_job_item from its queue, updates its local state (e.g., curr_group_exec_parts ), and executes the event via do_apply_event_worker() → do_apply_event() . Upon encountering a T‑event, the worker calls slave_worker_ends_group() , which commits positions, updates the corresponding Slave_job_group , and clears the group’s mapping references.
Checkpoint Process – The coordinator periodically runs mts_checkpoint_routine() based on time ( mts-checkpoint-period ) or the number of dispatched groups ( slave-checkpoint-group ). It advances the low‑water‑mark by scanning the Slave_committed_queue and removing completed groups via move_queue_head() .
Slave Stop and Exception Handling – stop slave triggers slave_stop_workers() , which signals each worker to stop, waits for them to finish, performs a final checkpoint, and releases resources. In case of worker or coordinator crashes, similar cleanup occurs but without the final checkpoint.
Recovery – After a normal or abnormal restart, the coordinator and workers use their persisted state to resume replication at a consistent point before re‑entering parallel mode.
Open Issues – MTS in 5.6 distributes work only at the database level; a single‑DB workload collapses to a single thread. A possible improvement is to hash on dbname+tablename for finer granularity, or even on transaction level, though the latter would require more extensive code changes.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.