How to Diagnose and Fix MySQL Replication Errors in UDB
This guide explains how to collect replication status, analyze MySQL error logs and binary logs, compare master and slave configurations, and troubleshoot common replication failures such as GTID mismatches, missing binlogs, and conflicting statements in UDB environments.
1. Collect Replication Information
When replication anomalies occur, first gather all relevant replication data and error details. The primary sources are SHOW SLAVE STATUS, MySQL error logs, binary log files, and replication‑related system variables.
SHOW SLAVE STATUS : Execute SHOW SLAVE STATUS and note the following fields:
Master_Log_File / Read_Master_Log_Pos – position of the master binlog being read.
Relay_Log_File / Relay_Log_Pos – position of the relay log on the slave.
Relay_Master_Log_File / Exec_Master_Log_Pos – master binlog file and position that the SQL thread has executed.
Slave_IO_Running / Slave_SQL_Running – indicate which thread (IO or SQL) is currently failing.
Retrieved_Gtid_Set / Executed_Gtid_Set – GTID ranges received and executed (relevant only when GTID is enabled).
Last_Errno, Last_IO_Errno, Last_SQL_Errno and their corresponding error messages – describe the last error encountered by each thread.
Check Error Logs : Review the MySQL error log for entries that record the start/stop of replication and any detailed error messages that may be truncated in SHOW SLAVE STATUS. Sample log snippets are shown in the original article.
Inspect Binary Log Files : Identify the master binlog, the slave relay log, and the slave binlog. Use SHOW BINARY LOGS or the mysqlbinlog utility to view events. Important event types include GTID, QUERY, TABLE_MAP, WRITE_ROWS, and XID.
Other Variables : Verify replication‑related variables such as gtid_mode and semi‑synchronous status with commands like SHOW VARIABLES LIKE 'gtid_mode' and SHOW STATUS LIKE 'Rpl_semi_sync_master_status'.
2. Diagnose Errors
After collecting the data, follow these steps to pinpoint the cause:
Identify whether the IO thread or SQL thread is failing via Slave_IO_Running and Slave_SQL_Running. Record the binlog file and position (or GTID) at the point of failure.
Examine the error log for issues not shown in SHOW SLAVE STATUS, such as disk space exhaustion, network interruptions, or manual termination of replication.
Compare the master and slave binary logs to ensure they contain identical events for the same GTID or file/position.
If GTID is enabled, verify that the slave has not executed extra GTIDs that the master lacks.
3. Version and Configuration Differences
The examples target MySQL 5.7; earlier versions (e.g., 5.6) store GTID information differently and may display logs in other formats. Key differences include:
GTID representation: 5.6 uses global system variables, while 5.7 stores GTIDs in the mysql.gtid_executed table.
Binary log format for auto‑increment IDs differs between versions.
4. Common Replication Errors and Analysis
Below are typical error scenarios and how to analyze them.
4.1 Slave Executes Statements Conflicting with Master
When the slave attempts to create a database, table, or insert a duplicate primary key that already exists on the master, replication stops. The error appears in Last_Error and the full message can be found in the error log.
4.2 Master Binlog Missing
If the master’s binlog file has been removed (e.g., via RESET MASTER, PURGE BINARY LOGS, or expire_logs_days), the slave cannot locate the required binlog and reports an error such as “Could not find the required binlog file.”
4.3 Slave Skips Transactions Due to GTID
When a GTID already exists in GTID_EXECUTED on the slave, the SQL thread will ignore the corresponding transaction, leading to data divergence without explicit error messages.
4.4 Non‑Logged Operations on Master
Operations like CREATE SERVER are not written to the binlog, so they must be executed manually on both master and slave to keep data consistent.
4.5 Duplicate Execution of Relay Log (Non‑GTID, Non‑MTS)
If relay_log_info_repository=FILE and the slave crashes before updating relay_log.info, the same transaction may be replayed after restart, causing primary‑key conflicts.
5. Summary
By systematically collecting SHOW SLAVE STATUS output, error‑log entries, and binary‑log events, and then cross‑checking master‑slave consistency, most replication problems can be diagnosed and resolved. Some failures—such as lost binlogs, disabled binlog on the master, or kernel bugs—may be unrecoverable without restoring from backups.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
