Recovering MySQL Replication After a Massive Binlog Overflow
This article explains why a large‑transaction update caused a binlog overflow and slave I/O failure in a remote MySQL master‑slave setup, and provides step‑by‑step commands and a stored‑procedure solution to restore replication safely.
Environment Description
The production environment consists of a geographically distributed master‑slave MySQL cluster (community version 5.6.25) with data volume exceeding 100 GB.
Problem Description
A colleague executed a full‑table update on the master using a SQL script, which succeeded on the master but caused the slave’s I/O thread to stop.
Problem Analysis
Relevant parameters
expire_logs_days = 7 # keep binlog for 7 days
max_binlog_size = 1G # each binlog file up to 1 GB
Table size and SQL
Table v_clda is 5.8 GB. The executed statement was UPDATE v_clda SET uploadtime = NOW();, which succeeded on the master.
Binlog growth on the master
-rw-rw---- 1 mysql mysql 1.1G Mar 16 02:49 mysql-bin.000159
-rw-rw---- 1 mysql mysql 8.0G Mar 16 15:28 mysql-bin.000160
-rw-rw---- 1 mysql mysql 7.4G Mar 16 18:13 mysql-bin.000161
-rw-rw---- 1 mysql mysql 1.1G Mar 16 23:55 mysql-bin.000162
-rw-rw---- 1 mysql mysql 1.1G Mar 17 12:15 mysql-bin.000163
-rw-rw---- 1 mysql mysql 1.1G Mar 18 16:54 mysql-bin.000164Slave error messages
[ERROR] Slave I/O: Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: og_file_name mysql-bin.000160<90>Ó°Y log_pos 121238917, Error_code: 1623
[ERROR] Slave I/O: Relay log write failure: could not queue event from master, Error_code: 1595
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.000160', position 3626103968
[Note] Error reading relay log event: slave SQL thread was killedThe slave could no longer synchronize data because a single transaction was written to a massive binlog file.
Problem Handling
Attempts to restart the slave thread repeatedly failed. The next step was to skip the problematic transaction using GTID.
Commands executed on the slave (GTID‑based):
stop slave;
SET @@SESSION.GTID_NEXT='498815d6-20a9-11e6-a7d6-fa163e5770cc:53'; -- adjust to actual GTID
BEGIN; COMMIT;
SET SESSION GTID_NEXT=AUTOMATIC;
START SLAVE;
show slave status\G; -- replication restoredAfter the slave was back, the same update was applied on the slave with binlog disabled:
SET SESSION sql_log_bin=0;
UPDATE v_clda SET uploadtime = NOW();Solution
To prevent similar issues, disable binary logging for large transactions in advance ( SET SESSION sql_log_bin=0;), especially in remote data centers with limited bandwidth and unstable VPN connections.
If a large transaction has already generated excessive binlog files, use the GTID‑skip method shown above, or as a last resort, rebuild the master‑slave topology (not recommended for large, remote datasets).
A more robust approach is to split large transactions into smaller batches. For example, using a stored procedure that deletes or updates rows in chunks of 10,000:
DELIMITER $$
USE BIGDB$$
DROP PROCEDURE IF EXISTS BIG_table_delete_10k$$
CREATE PROCEDURE BIG_table_delete_10k(IN v_UserId INT)
BEGIN
del_10k: LOOP
DELETE FROM BIGDB.BIGTABLE WHERE UserId = v_UserId LIMIT 10000;
SELECT ROW_COUNT() INTO @count;
IF @count = 0 THEN
SELECT CONCAT('BIGDB.BIGTABLE UserId = ', v_UserId, ' is ', @count, ' rows.') AS BIGTABLE_delete_finish;
LEAVE del_10k;
END IF;
SELECT SLEEP(1);
END LOOP del_10k;
END$$
DELIMITER ;After deploying the procedure, replication remained stable and the issue was resolved.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
