Databases 12 min read

Locating Large Transactions in MySQL Binlog Using GTID and Shell Scripts

This article explains how to identify and extract large MySQL transactions from binary log files by monitoring binlog size, parsing GTID information, and using shell scripts to retrieve transaction details, timestamps, positions, and DML statistics, with optional use of the my2sql tool.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Locating Large Transactions in MySQL Binlog Using GTID and Shell Scripts

1 Introduction

Large transactions are common in MySQL, and the first step to handling them is to locate them. By examining binlog file sizes—especially when a binlog exceeds max_binlog_size —you can suspect the presence of a big transaction.

The official documentation states: A transaction is written in one chunk to the binary log, so it is never split between several binary logs. Therefore, if you have big transactions, you might see binary log files larger than max_binlog_size.

Observing unusually large binlog files allows you to target those files for further parsing, though large transactions can also be hidden in normally sized binlogs.

2 Practice

For instances with GTID enabled, locating the GTID of a large transaction is sufficient. The following example demonstrates parsing a binlog to extract the top N largest transactions.

Environment

Test Environment

binlog format

row

binlog version

v4

GTID

Enabled

Verified MySQL versions

MySQL 5.7.30, MySQL 8.0.28

First, extract the top N transactions by size:

# 为了方便保存为脚本,这里定义几个基本的变量
BINLOG_FILE_NAME=$1                       # binlog文件名
TRANS_NUM=$2                              # 想要获取的事务数量
MYSQL_BIN_DIR='/data/mysql/3306/base/bin' # basedir

# 获取前TRANS_NUM个大事务
${MYSQL_BIN_DIR}/mysqlbinlog ${BINLOG_FILE_NAME} | grep "GTID$(printf '	')last_committed" -B 1  | grep -E '^# at' | awk '{print $3}' | awk 'NR==1 {tmp=$1} NR>1 {print ($1-tmp,tmp);tmp=$1}' | sort -n -r -k 1 | head -n ${TRANS_NUM} > binlog_init.tmp

After obtaining the size and POS information, parse each line to retrieve GTID, start/end times, positions, and DML statistics:

while read line
do
    # 事务大小这里取近似值,因为不是通过(TRANS_END_POS-TRANS_START_POS)计算出的
    TRANS_SIZE=$(echo ${line} | awk '{print $1}')
    logWriteWarning "TRANS_SIZE: $(echo | awk -v TRANS_SIZE=${TRANS_SIZE} '{ print (TRANS_SIZE/1024/1024) }')MB"
    FLAG_POS=$(echo ${line} | awk '{print $2}')
    # 获取GTID
    ${MYSQL_BIN_DIR}/mysqlbinlog -vvv --base64-output=decode-rows ${BINLOG_FILE_NAME} | grep -m 1 -A3 -Ei "^# at ${FLAG_POS}" > binlog_parse.tmp
    GTID=$(cat binlog_parse.tmp | grep -i 'SESSION.GTID_NEXT' | awk -F "'" '{print $2}')
    # 通过GTID解析出事务的详细信息
    ${MYSQL_BIN_DIR}/mysqlbinlog --base64-output=decode-rows -vvv --include-gtids="${GTID}" ${BINLOG_FILE_NAME} > binlog_gtid.tmp
    START_TIME=$(grep -Ei '^BEGIN' -m 1 -A 3 binlog_gtid.tmp | grep -i 'server id' | awk '{print $1,$2}' | sed 's/#//g')
    END_TIME=$(grep -Ei '^COMMIT' -m 1 -B 1 binlog_gtid.tmp | head -1 | awk '{print $1,$2}' | sed 's/#//g')
    TRANS_START_POS=$(grep -Ei 'SESSION.GTID_NEXT' -m 1 -A 1 binlog_gtid.tmp | tail -1 | awk '{print $3}')
    TRANS_END_POS=$(grep -Ei '^COMMIT' -m 1 -B 1 binlog_gtid.tmp | head -1 | awk '{print $7}')
    logWrite "GTID: ${GTID}"
    logWrite "START_TIME: $(date -d "${START_TIME}" '+%F %T')"
    logWrite "END_TIME: $(date -d "${END_TIME}" '+%F %T')"
    logWrite "TRANS_START_POS: ${TRANS_START_POS}"
    logWrite "TRANS_END_POS: ${TRANS_END_POS}"
    logWrite "该事务的DML语句及相关表统计:"
    grep -Ei '^### insert' binlog_gtid.tmp | sort | uniq -c
    grep -Ei '^### delete' binlog_gtid.tmp | sort | uniq -c
    grep -Ei '^### update' binlog_gtid.tmp | sort | uniq -c
 done < binlog_init.tmp

These steps produce GTID, transaction timestamps, start/end positions, and a count of DML statements per table. The script can be packaged as a daily operations tool.

Final Output Example

[root@localhost ~]$ sh parse_binlog.sh /opt/sandboxes/rsandbox_5_7_35/master/data/mysql-bin.000003 2
2023-12-12 15:15:40 [WARNING] 开始解析BINLOG: /opt/sandboxes/rsandbox_5_7_35/master/data/mysql-bin.000003 
2023-12-12 15:15:53 [WARNING] TRANS_SIZE: 0.00161743MB 
2023-12-12 15:16:06 [INFO] GTID: 00020236-1111-1111-1111-111111111111:362779 
2023-12-12 15:16:06 [INFO] START_TIME: 2023-12-12 15:14:35 
2023-12-12 15:16:06 [INFO] END_TIME: 2023-12-12 15:14:35 
2023-12-12 15:16:06 [INFO] TRANS_START_POS: 362096066 
2023-12-12 15:16:06 [INFO] TRANS_END_POS: 362097697 
2023-12-12 15:16:06 [INFO] 该事务的DML语句及相关表统计: 
      1 ### INSERT INTO `sbtest`.`sbtest100`
      1 ### DELETE FROM `sbtest`.`sbtest100`
      2 ### UPDATE `sbtest`.`sbtest100`
2023-12-12 15:16:06 [WARNING] TRANS_SIZE: 0.00161648MB 
2023-12-12 15:16:25 [INFO] GTID: 00020236-1111-1111-1111-111111111111:505503 
2023-12-12 15:16:25 [INFO] START_TIME: 2023-12-12 15:15:36 
2023-12-12 15:16:25 [INFO] END_TIME: 2023-12-12 15:15:36 
2023-12-12 15:16:25 [INFO] TRANS_START_POS: 603539112 
2023-12-12 15:16:25 [INFO] TRANS_END_POS: 603540742 
2023-12-12 15:16:25 [INFO] 该事务的DML语句及相关表统计: 
      1 ### INSERT INTO `sbtest`.`sbtest100`
      1 ### DELETE FROM `sbtest`.`sbtest100`
      1 ### UPDATE `sbtest`.`sbtest100`
      1 ### UPDATE `sbtest`.`sbtest87`

For more advanced analysis, the open‑source tool my2sql can filter by rows or time, and works in file mode with stats work‑type to parse any MySQL instance’s binlog without needing the original schema.

# 统计指定 binlog 中各个表的 DML 操作数量(不加 row 和 time 限制)
[root@localhost ~]$ mkdir tmpdir
[root@localhost ~]$ ./my2sql -user root -password xxx -host 127.0.0.1 -port 3306 -mode file -local-binlog-file ./mysql-bin.005375 -work-type stats -start-file mysql-bin.005375 -output-dir ./tmpdir
# 按照事务的行数倒序排序
[root@localhost ~]$ less tmpdir/biglong_trx.txt | sort -nr -k 6 | less
# 按照事务的执行时间倒序排序
[root@localhost ~]$ less tmpdir/biglong_trx.txt | sort -nr -k 7 | less
# 输出示例(binlog starttime stoptime startpos stoppos rows duration tables)
mysql-bin.005375 2023-12-12_16:04:06 2023-12-12_16:16:59 493014756 582840954 123336 53 [test.t1(inserts=61668, updates=0, deletes=0) test.t2(inserts=61668, updates=0, deletes=0)]
MySQLbinlogshell scriptGTIDDatabase Troubleshootinglarge transaction
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.