Databases 14 min read

Understanding MySQL Binlog Structure and Using Python to Perform Flashback, Detect Large Transactions, and Split Binlog Files

This article explains the structure of MySQL ROW‑mode binlogs, demonstrates how to modify binlog events with Python to recover deleted rows, locate oversized transactions, and split large transactions into smaller ones, providing practical scripts and examples for advanced database manipulation.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Understanding MySQL Binlog Structure and Using Python to Perform Flashback, Detect Large Transactions, and Split Binlog Files

MySQL binlogs record every change in the database; understanding their ROW‑mode structure enables parsing, modification, and even "flashback"‑style recovery of mistakenly deleted rows. The article focuses on the default ROW mode used in MySQL 8.

Binlog Structure

Binlogs consist of events (not transactions), each describing a modification. Since MySQL 5, the binlog version is v4 and defines 36 event types, listed in the following enumeration:

enum Log_event_type {
  UNKNOWN_EVENT= 0,
  START_EVENT_V3= 1,
  QUERY_EVENT= 2,
  STOP_EVENT= 3,
  ROTATE_EVENT= 4,
  INTVAR_EVENT= 5,
  LOAD_EVENT= 6,
  SLAVE_EVENT= 7,
  CREATE_FILE_EVENT= 8,
  APPEND_BLOCK_EVENT= 9,
  EXEC_LOAD_EVENT= 10,
  DELETE_FILE_EVENT= 11,
  NEW_LOAD_EVENT= 12,
  RAND_EVENT= 13,
  USER_VAR_EVENT= 14,
  FORMAT_DESCRIPTION_EVENT= 15,
  XID_EVENT= 16,
  BEGIN_LOAD_QUERY_EVENT= 17,
  EXECUTE_LOAD_QUERY_EVENT= 18,
  TABLE_MAP_EVENT = 19,
  PRE_GA_WRITE_ROWS_EVENT = 20,
  PRE_GA_UPDATE_ROWS_EVENT = 21,
  PRE_GA_DELETE_ROWS_EVENT = 22,
  WRITE_ROWS_EVENT = 23,
  UPDATE_ROWS_EVENT = 24,
  DELETE_ROWS_EVENT = 25,
  INCIDENT_EVENT= 26,
  HEARTBEAT_LOG_EVENT= 27,
  IGNORABLE_LOG_EVENT= 28,
  ROWS_QUERY_LOG_EVENT= 29,
  WRITE_ROWS_EVENT = 30,
  UPDATE_ROWS_EVENT = 31,
  DELETE_ROWS_EVENT = 32,
  GTID_LOG_EVENT= 33,
  ANONYMOUS_GTID_LOG_EVENT= 34,
  PREVIOUS_GTIDS_LOG_EVENT= 35,
  ENUM_END_EVENT /* end marker */
};

Each binlog file starts with a Format Description Event and ends with a Rotate Event . An example binlog listing is shown below:

+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
| Log_name    | Pos | Event_type     | Server_id | End_log_pos | Info                                                   |
+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
| scut.000023 |   4 | Format_desc    |      1024 |         123 | Server ver: 5.7.31-0ubuntu0.16.04.1-log, Binlog ver: 4 |
| scut.000023 | 123 | Previous_gtids |      1024 |         154 |                                                        |
| scut.000023 | 154 | Anonymous_Gtid |      1024 |         219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'                   |
| scut.000023 | 219 | Query          |      1024 |         291 | BEGIN                                                  |
| scut.000023 | 291 | Rows_query     |      1024 |         330 | # delete from tt1                                      |
| scut.000023 | 330 | Table_map      |      1024 |         378 | table_id: 111 (test.tt1)                               |
| scut.000023 | 378 | Delete_rows    |      1024 |         434 | table_id: 111 flags: STMT_END_F                        |
| scut.000023 | 434 | Xid            |      1024 |         465 | COMMIT /* xid=216 */                                   |
| scut.000023 | 465 | Rotate         |      1024 |         507 | scut.000024;pos=4                                      |
+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
9 rows in set (0.00 sec)

Recovering Accidentally Deleted Records

By changing the event type of a DELETE_ROWS_EVENT (type code 32) to WRITE_ROWS_EVENT (type code 30) at the appropriate byte offset, the deleted row can be re‑inserted. The following Python script modifies the 383rd byte (5th byte of the event header) to achieve this:

#! /usr/bin/python3
import sys

if len(sys.argv) != 3:
        print ('Please run chtype.py inputType changedType.')
        sys.exit()

inputType=open(sys.argv[1],"rb")
changedType=open(sys.argv[2],"wb")

changedType.write(inputType.read(382))
changedType.write(chr(30).encode())
inputType.seek(1,1)
while True:
  line = inputType.readline()
  if not line:
        break
  changedType.write(line)

inputType.close()
changedType.close()

Running the original and modified binlogs shows that the record is restored, effectively providing an Oracle‑like flashback capability.

Finding Large Transactions in Binlog

Because ROW‑mode logs each row change, a simple UPDATE without a WHERE clause can generate a massive transaction. The script below parses the output of mysqlbinlog , identifies BEGIN and COMMIT positions, and reports the largest transaction size:

$ cat ./checkBigTran.py 
#! /usr/bin/python3
import sys

position=0
beginPosition=0
endPosition=0
maxSize=0
isEnd=0
for line in sys.stdin:
      if line[: 4]=='# at':
          position=int(line[5:])
          if isEnd:
             endPosition=position
             isEnd=0
      if line[: 5]=='BEGIN':
          beginPosition=position
      if line[: 6]=='COMMIT':
          isEnd=1
      if endPosition-beginPosition>maxSize:
          maxBeginPosition= beginPosition
          maxEndPosition=endPosition
          maxSize=endPosition-beginPosition

print("The largest transaction size is %d, the begion position is %d, the end position is %d." % (maxSize,maxBeginPosition,maxEndPosition))

Example usage:

$ mysqlbinlog binlog1|./checkBigTran.py 
The largest transaction size is 1468183501, the begion position is 5737766, the end position is 1473921267.

Splitting Large Transactions

MySQL splits large transactions into multiple events based on the binlog-row-event-max-size (default 8 KB). By inserting additional events (e.g., Anonymous_Gtid , Query , Xid ) between existing ones, a single large transaction can be divided into two smaller ones. The following Python program performs this operation:

# cat splitTran.py 
#! /usr/bin/python3
import sys

if len(sys.argv) != 3:
    print ('Please run splitTrans.py inputBinlog changedBinlog.')
    sys.exit()

inputBinlog=open(sys.argv[1],"rb")
changedBinlog=open(sys.argv[2],"wb")

changedBinlog.write(inputBinlog.read(429))   # write up to first insert
firstInsert=inputBinlog.tell()
inputBinlog.seek(567,0)  # locate xid event
changedBinlog.write(inputBinlog.read(31))  # write xid event
inputBinlog.seek(154,0)  # locate Anonymous_Gtid and Query events
changedBinlog.write(inputBinlog.read(137))  # write those events
inputBinlog.seek(firstInsert)
while True:
      line = inputBinlog.readline()
      if not line:
           break
      changedBinlog.write(line)

inputBinlog.close()
changedBinlog.close()

After running the script and applying the new binlog, the two INSERT statements are executed in separate transactions, as shown by the resulting show binlog events output.

Conclusion

Understanding the ROW‑mode binlog format, combined with simple Python utilities, enables powerful operations such as data recovery, transaction size analysis, and binlog splitting, which can aid in database migration, auditing, and potentially future flashback features in MySQL.

PythonTransactionDatabaseMySQLbinlogData RecoveryFlashback
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.