Databases 17 min read

Analysis of Inconsistencies in MySQL Replication after Slave Crash and Configuration Recommendations

This article examines how slave crashes in MySQL replication can cause inconsistencies between master info and relay log positions, leading to duplicate events and errors such as 1062 and 1032, analyzes underlying mechanisms, presents case studies, and proposes configuration settings to achieve server‑crash‑safe replication.

Tencent Database Technology

Apr 7, 2021

Analysis of Inconsistencies in MySQL Replication after Slave Crash and Configuration Recommendations

Part 1 – Background

When a MySQL slave crashes due to configuration issues, the subsequent re‑establishment of replication may raise error 1062 or 1032. Local reproduction shows that some binlog events are duplicated before and after the crash, suggesting that the IO thread may have copied certain events twice.

Two pieces of state are relevant to replication: master info , which stores the persisted relay‑log position of the slave IO thread, and relay‑log info , which stores the position that the slave SQL thread has already applied. Their storage format is controlled by master_info_repository and relay_log_info_repository, while the sync frequency depends on sync_master_info and sync_relay_log_info.

Stored Information

master info

relay log info

Stored position information

Relay‑log position persisted by the slave IO thread

Relay‑log position already applied by the slave SQL thread

Storage‑format control variable

master_info_repository

relay_log_info_repository

Sync‑frequency variable

sync_master_info

sync_relay_log_info

The article focuses on two inconsistency scenarios that may appear after a slave crash and the problems they cause for master‑slave synchronization:

During sync_master_info, the persisted relay‑log position in master info may diverge from the actual relay‑log position.

During sync_relay_log_info, the position recorded in relay‑log info may diverge from the position actually applied in the relay log.

Part 2 – Analysis

After a slave crash, the mismatch of state information can be traced to MySQL’s non‑transactional operations: sync_master_info and sync_relay_log_info. The following sections analyze each operation.

2.1 sync_master_info inconsistency

Default configuration:

# default configuration
relay_log_recovery = OFF
sync_master_info=10000
sync_relay_log_info=10000
slave_parallel_type=DATABASE

Call stack for sync_master_info:

# stack‑sync_master_info
/-- handle_slave_io   # executed for each received event
    /-- flush_master_info
        /-- Relay_log_info::flush_current_log   # writes event to relay log file
        /-- Master_info::flush_info
            /-- Rpl_info_handler::flush_info   # persists the current position to master info

If a crash occurs between writing the relay‑log file ( my_b_flush_io_cache) and the subsequent my_sync, the master info may miss the last position while the relay log already contains the event, causing the slave to re‑pull the same event after restart.

2.1.1 master_info_repository = FILE

When master_info_repository=FILE, the slave stores the position in master.info under the data directory. The flush sequence is:

/-- Rpl_info_handler::flush_info
    /-- Rpl_info_file::do_flush_info
        /-- flush_io_cache
            /-- my_b_flush_io_cache   # writes master.info but does not sync
        /-- my_sync   # executed every <code>sync_master_info</code> calls

A crash between my_b_flush_io_cache and my_sync leaves the file unsynced; a crash between Relay_log_info::flush_current_log and my_b_flush_io_cache can cause the relay log to contain an event that master info does not record, leading to duplicate events after restart.

Two concrete cases observed:

Case ID

Events before crash

master.info position at crash

Relay log persisted at crash

Binlog start point after restart

Replication success

Case 1

130

master-bin.0000001:5479

master-bin.0000001:0~5524

master-bin.0000001:5479

Case 2

131

master-bin.0000001:5524

master-bin.0000001:0~5564

master-bin.0000001:5524

Both cases cause the slave to rotate a new relay‑log file after re‑establishment, and because the start point is earlier than the last persisted position, an event range (e.g., master-bin.0000001:5479~5524) is duplicated. In Case 1 the duplicated event is a Table_map_event without any row‑changing event, so it does not affect data. In Case 2 the duplicated event is a Write_rows_event, but the corresponding Table_map_event is missing in the new relay log, so the row‑change is also ignored.

#210220 14:18:31 server id 1  end_log_pos 5609 CRC32 0x808ed9e0 Table_map: `test`.`t1` mapped to number 221

#210220 14:18:31 server id 1  end_log_pos 5564 CRC32 0x44309595 Write_rows: table id 221 flags: STMT_END_F

If such duplicated events belong to an UPDATE or DELETE transaction, the missing Table_map_event can cause the transaction to be partially applied, leading to error 1032 (record not found) or 1062 (duplicate entry).

2.1.2 master_info_repository = TABLE

When master_info_repository=TABLE, the slave persists the position in the mysql.slave_master_info table. The flush flow is:

/-- Rpl_info_handler::flush_info
    /-- Rpl_info_table::do_flush_info
        /-- handler::ha_update_row   # updates the table; executed every <code>sync_master_info</code> (default 10000) events

Only after sync_master_info events are processed does the table get updated, so events between two sync points are not recorded. After a restart, the slave resumes from the last recorded position, potentially re‑processing a large block of events and causing duplicate execution.

In GTID mode with MASTER_AUTO_POSITION enabled, the slave uses GTID sets (stored in InnoDB tables) instead of master info, which are updated atomically and avoid the above inconsistency.

2.2 sync_relay_log_info inconsistency

When relay_log_info_repository=FILE, the flush sequence during a transaction commit is:

/-- Xid_apply_log_event::do_apply_event   # called for each XID (COMMIT)
    /-- Xid_log_event::do_commit
        /-- Relay_log_info::flush_info
            /-- Rpl_info_handler::flush_info
                /-- Rpl_info_file::do_flush_info
                    /-- flush_io_cache
                        /-- my_b_flush_io_cache   # writes file but does not sync
                    /-- my_sync   # executed every <code>sync_relay_log_info</code> (default 10000) calls

If a crash occurs between Xid_log_event::do_commit and my_b_flush_io_cache, the relay‑log info may not reflect the committed transaction, causing the same transaction to be applied again after restart, which can also trigger 1062/1032 errors.

master-bin.0000001:120   Query (begin)  # trx1
... 
master-bin.0000001:150   Xid (commit)
master-bin.0000001:160   Query (begin)  # trx2
... 
master-bin.0000001:190   Xid (commit)

If the crash happens after the commit of trx2 but before the flush, the slave will re‑apply trx2 on restart, resulting in duplicate execution.

Part 3 – Conclusion

Under certain configurations, the mismatches between master‑info positions, relay‑log persistence, and relay‑log‑info positions can cause events or whole transactions to be executed twice, leading to duplicate‑key errors (1062) or missing‑record errors (1032).

To achieve a server‑crash‑safe replication setup, the following configuration recommendations are provided:

Non‑GTID mode: enable relay_log_info_repository=TABLE and relay_log_recovery=ON.

GTID mode: enable MASTER_AUTO_POSITION=ON and relay_log_recovery=ON (the COM_BINLOG_DUMP_GTID protocol ensures safety).

Tencent Database Technology Team supports internal services such as QQ Space, WeChat Red Packets, Tencent Ads, Tencent Music, Tencent News, and external products on Tencent Cloud like TDSQL‑C (formerly CynosDB), TencentDB for MySQL (CDB), CTSDB, MongoDB, CES, etc. This public account aims to share professional database knowledge with enthusiasts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Relay Log Database Configuration Master Info Server Crash Safe Slave Crash

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.