Databases 7 min read

Recovering MySQL Replication After a Massive Binlog Overflow

This article explains why a large‑transaction update caused a binlog overflow and slave I/O failure in a remote MySQL master‑slave setup, and provides step‑by‑step commands and a stored‑procedure solution to restore replication safely.

dbaplus Community
dbaplus Community
dbaplus Community
Recovering MySQL Replication After a Massive Binlog Overflow

Environment Description

The production environment consists of a geographically distributed master‑slave MySQL cluster (community version 5.6.25) with data volume exceeding 100 GB.

Problem Description

A colleague executed a full‑table update on the master using a SQL script, which succeeded on the master but caused the slave’s I/O thread to stop.

Problem Analysis

Relevant parameters

expire_logs_days = 7  # keep binlog for 7 days

max_binlog_size = 1G  # each binlog file up to 1 GB

Table size and SQL

Table v_clda is 5.8 GB. The executed statement was UPDATE v_clda SET uploadtime = NOW();, which succeeded on the master.

Binlog growth on the master

-rw-rw---- 1 mysql mysql 1.1G Mar 16 02:49 mysql-bin.000159
-rw-rw---- 1 mysql mysql 8.0G Mar 16 15:28 mysql-bin.000160
-rw-rw---- 1 mysql mysql 7.4G Mar 16 18:13 mysql-bin.000161
-rw-rw---- 1 mysql mysql 1.1G Mar 16 23:55 mysql-bin.000162
-rw-rw---- 1 mysql mysql 1.1G Mar 17 12:15 mysql-bin.000163
-rw-rw---- 1 mysql mysql 1.1G Mar 18 16:54 mysql-bin.000164

Slave error messages

[ERROR] Slave I/O: Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: og_file_name mysql-bin.000160<90>Ó°Y log_pos 121238917, Error_code: 1623
[ERROR] Slave I/O: Relay log write failure: could not queue event from master, Error_code: 1595
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.000160', position 3626103968
[Note] Error reading relay log event: slave SQL thread was killed

The slave could no longer synchronize data because a single transaction was written to a massive binlog file.

Problem Handling

Attempts to restart the slave thread repeatedly failed. The next step was to skip the problematic transaction using GTID.

Commands executed on the slave (GTID‑based):

stop slave;
SET @@SESSION.GTID_NEXT='498815d6-20a9-11e6-a7d6-fa163e5770cc:53'; -- adjust to actual GTID
BEGIN; COMMIT;
SET SESSION GTID_NEXT=AUTOMATIC;
START SLAVE;
show slave status\G;  -- replication restored

After the slave was back, the same update was applied on the slave with binlog disabled:

SET SESSION sql_log_bin=0;
UPDATE v_clda SET uploadtime = NOW();

Solution

To prevent similar issues, disable binary logging for large transactions in advance ( SET SESSION sql_log_bin=0;), especially in remote data centers with limited bandwidth and unstable VPN connections.

If a large transaction has already generated excessive binlog files, use the GTID‑skip method shown above, or as a last resort, rebuild the master‑slave topology (not recommended for large, remote datasets).

A more robust approach is to split large transactions into smaller batches. For example, using a stored procedure that deletes or updates rows in chunks of 10,000:

DELIMITER $$
USE BIGDB$$
DROP PROCEDURE IF EXISTS BIG_table_delete_10k$$
CREATE PROCEDURE BIG_table_delete_10k(IN v_UserId INT)
BEGIN
  del_10k: LOOP
    DELETE FROM BIGDB.BIGTABLE WHERE UserId = v_UserId LIMIT 10000;
    SELECT ROW_COUNT() INTO @count;
    IF @count = 0 THEN
      SELECT CONCAT('BIGDB.BIGTABLE UserId = ', v_UserId, ' is ', @count, ' rows.') AS BIGTABLE_delete_finish;
      LEAVE del_10k;
    END IF;
    SELECT SLEEP(1);
  END LOOP del_10k;
END$$
DELIMITER ;

After deploying the procedure, replication remained stable and the issue was resolved.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlReplicationGTIDStored Procedure
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.