Understanding MySQL Multi‑Threaded Slave (MTS) Checkpoint Mechanism and Event Execution
This article explains how MySQL's Multi‑Threaded Slave (MTS) processes events, manages checkpoints, and persists state using GAQ queues, bitmaps, and system tables, providing detailed code references and configuration parameters for reliable parallel replication.
This note describes the internal workflow of MySQL's Multi‑Threaded Slave (MTS), focusing on how worker threads execute events and how checkpoints are created and persisted.
1. Worker thread execution of Event – After the coordinator thread distributes events to a worker's execution queue, the worker reads events, waits when the queue is empty, and processes special events such as XID_EVENT to finalize transaction memory updates. Key functions include slave_worker_exec_job_group , Slave_worker::slave_worker_exec_event , and Xid_apply_log_event::do_apply_event_worker . The article lists the exact code used to update current information, write checkpoint data, set GAQ sequence numbers, and adjust the bitmap.
2. Checkpoint concepts in MTS – A checkpoint marks the low‑water‑mark (LWM) in the GAQ queue, indicating that all transactions up to that point have been applied. The coordinator thread maintains the GAQ (a circular queue of Slave_job_group descriptors) and a checkpoint_seqno counter. Checkpoints are persisted to the slave_relay_log_info table, while each worker's progress is stored in slave_worker_info .
3. GAQ queue and bitmap handling – The GAQ queue size is defined by slave_checkpoint_group . Workers maintain a bitmap where each bit corresponds to a transaction in GAQ; bits are set to ‘1’ when the transaction commits. The bitmap is shifted after a checkpoint to reflect removed groups, using code such as bitmap_set_bit(&group_executed, pos - ptr_g->shifted) and bitmap_fast_test_and_set(groups, j) .
4. Persistence of coordinator and worker information – During a checkpoint the coordinator updates in‑memory positions ( group_master_log_pos , group_relay_log_pos , etc.) and then forces a write to slave_relay_log_info via rli->flush_info(TRUE) . Workers write their final state to slave_worker_info , including relay log names, positions, checkpoint identifiers, and bitmap size.
5. Checkpoint triggering and routine – A checkpoint occurs when the elapsed time exceeds slave_checkpoint_period , the GAQ queue is full, or the slave is stopped. The routine mts_checkpoint_routine scans the GAQ queue, stops at the first uncommitted transaction, updates memory and tables, records the timestamp for Seconds_Behind_Master , and adjusts checkpoint_seqno and bitmap offsets for all workers.
Overall, the article provides a step‑by‑step explanation of MTS checkpoint creation, the data structures involved, and the configuration parameters that influence parallel replication reliability.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.