Understanding MySQL Binlog Structure and Using Python to Perform Flashback, Detect Large Transactions, and Split Binlog Files
This article explains the structure of MySQL ROW‑mode binlogs, demonstrates how to modify binlog events with Python to recover deleted rows, locate oversized transactions, and split large transactions into smaller ones, providing practical scripts and examples for advanced database manipulation.
MySQL binlogs record every change in the database; understanding their ROW‑mode structure enables parsing, modification, and even "flashback"‑style recovery of mistakenly deleted rows. The article focuses on the default ROW mode used in MySQL 8.
Binlog Structure
Binlogs consist of events (not transactions), each describing a modification. Since MySQL 5, the binlog version is v4 and defines 36 event types, listed in the following enumeration:
enum Log_event_type {
UNKNOWN_EVENT= 0,
START_EVENT_V3= 1,
QUERY_EVENT= 2,
STOP_EVENT= 3,
ROTATE_EVENT= 4,
INTVAR_EVENT= 5,
LOAD_EVENT= 6,
SLAVE_EVENT= 7,
CREATE_FILE_EVENT= 8,
APPEND_BLOCK_EVENT= 9,
EXEC_LOAD_EVENT= 10,
DELETE_FILE_EVENT= 11,
NEW_LOAD_EVENT= 12,
RAND_EVENT= 13,
USER_VAR_EVENT= 14,
FORMAT_DESCRIPTION_EVENT= 15,
XID_EVENT= 16,
BEGIN_LOAD_QUERY_EVENT= 17,
EXECUTE_LOAD_QUERY_EVENT= 18,
TABLE_MAP_EVENT = 19,
PRE_GA_WRITE_ROWS_EVENT = 20,
PRE_GA_UPDATE_ROWS_EVENT = 21,
PRE_GA_DELETE_ROWS_EVENT = 22,
WRITE_ROWS_EVENT = 23,
UPDATE_ROWS_EVENT = 24,
DELETE_ROWS_EVENT = 25,
INCIDENT_EVENT= 26,
HEARTBEAT_LOG_EVENT= 27,
IGNORABLE_LOG_EVENT= 28,
ROWS_QUERY_LOG_EVENT= 29,
WRITE_ROWS_EVENT = 30,
UPDATE_ROWS_EVENT = 31,
DELETE_ROWS_EVENT = 32,
GTID_LOG_EVENT= 33,
ANONYMOUS_GTID_LOG_EVENT= 34,
PREVIOUS_GTIDS_LOG_EVENT= 35,
ENUM_END_EVENT /* end marker */
};Each binlog file starts with a Format Description Event and ends with a Rotate Event . An example binlog listing is shown below:
+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
| scut.000023 | 4 | Format_desc | 1024 | 123 | Server ver: 5.7.31-0ubuntu0.16.04.1-log, Binlog ver: 4 |
| scut.000023 | 123 | Previous_gtids | 1024 | 154 | |
| scut.000023 | 154 | Anonymous_Gtid | 1024 | 219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS' |
| scut.000023 | 219 | Query | 1024 | 291 | BEGIN |
| scut.000023 | 291 | Rows_query | 1024 | 330 | # delete from tt1 |
| scut.000023 | 330 | Table_map | 1024 | 378 | table_id: 111 (test.tt1) |
| scut.000023 | 378 | Delete_rows | 1024 | 434 | table_id: 111 flags: STMT_END_F |
| scut.000023 | 434 | Xid | 1024 | 465 | COMMIT /* xid=216 */ |
| scut.000023 | 465 | Rotate | 1024 | 507 | scut.000024;pos=4 |
+-------------+-----+----------------+-----------+-------------+--------------------------------------------------------+
9 rows in set (0.00 sec)Recovering Accidentally Deleted Records
By changing the event type of a DELETE_ROWS_EVENT (type code 32) to WRITE_ROWS_EVENT (type code 30) at the appropriate byte offset, the deleted row can be re‑inserted. The following Python script modifies the 383rd byte (5th byte of the event header) to achieve this:
#! /usr/bin/python3
import sys
if len(sys.argv) != 3:
print ('Please run chtype.py inputType changedType.')
sys.exit()
inputType=open(sys.argv[1],"rb")
changedType=open(sys.argv[2],"wb")
changedType.write(inputType.read(382))
changedType.write(chr(30).encode())
inputType.seek(1,1)
while True:
line = inputType.readline()
if not line:
break
changedType.write(line)
inputType.close()
changedType.close()Running the original and modified binlogs shows that the record is restored, effectively providing an Oracle‑like flashback capability.
Finding Large Transactions in Binlog
Because ROW‑mode logs each row change, a simple UPDATE without a WHERE clause can generate a massive transaction. The script below parses the output of mysqlbinlog , identifies BEGIN and COMMIT positions, and reports the largest transaction size:
$ cat ./checkBigTran.py
#! /usr/bin/python3
import sys
position=0
beginPosition=0
endPosition=0
maxSize=0
isEnd=0
for line in sys.stdin:
if line[: 4]=='# at':
position=int(line[5:])
if isEnd:
endPosition=position
isEnd=0
if line[: 5]=='BEGIN':
beginPosition=position
if line[: 6]=='COMMIT':
isEnd=1
if endPosition-beginPosition>maxSize:
maxBeginPosition= beginPosition
maxEndPosition=endPosition
maxSize=endPosition-beginPosition
print("The largest transaction size is %d, the begion position is %d, the end position is %d." % (maxSize,maxBeginPosition,maxEndPosition))Example usage:
$ mysqlbinlog binlog1|./checkBigTran.py
The largest transaction size is 1468183501, the begion position is 5737766, the end position is 1473921267.Splitting Large Transactions
MySQL splits large transactions into multiple events based on the binlog-row-event-max-size (default 8 KB). By inserting additional events (e.g., Anonymous_Gtid , Query , Xid ) between existing ones, a single large transaction can be divided into two smaller ones. The following Python program performs this operation:
# cat splitTran.py
#! /usr/bin/python3
import sys
if len(sys.argv) != 3:
print ('Please run splitTrans.py inputBinlog changedBinlog.')
sys.exit()
inputBinlog=open(sys.argv[1],"rb")
changedBinlog=open(sys.argv[2],"wb")
changedBinlog.write(inputBinlog.read(429)) # write up to first insert
firstInsert=inputBinlog.tell()
inputBinlog.seek(567,0) # locate xid event
changedBinlog.write(inputBinlog.read(31)) # write xid event
inputBinlog.seek(154,0) # locate Anonymous_Gtid and Query events
changedBinlog.write(inputBinlog.read(137)) # write those events
inputBinlog.seek(firstInsert)
while True:
line = inputBinlog.readline()
if not line:
break
changedBinlog.write(line)
inputBinlog.close()
changedBinlog.close()After running the script and applying the new binlog, the two INSERT statements are executed in separate transactions, as shown by the resulting show binlog events output.
Conclusion
Understanding the ROW‑mode binlog format, combined with simple Python utilities, enables powerful operations such as data recovery, transaction size analysis, and binlog splitting, which can aid in database migration, auditing, and potentially future flashback features in MySQL.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.