Analyzing MySQL Slow Log: Identifying Disk I/O and Semi‑Sync Replication Bottlenecks
This article walks through a systematic analysis of MySQL slow‑log entries, correlating them with error‑log flush events, disk‑I/O monitoring, and semi‑synchronous replication traces to pinpoint performance bottlenecks and provide practical troubleshooting steps.
The author, a DBA specializing in MySQL, shares a real‑world case where a client reported slow INSERT operations; inspection of slow.log revealed widespread latency across multiple SQL types.
Step 1 – Macro statistics from slow.log – By aggregating timestamps, the author identified periods with batch‑slow queries, primarily INSERT, DELETE, UPDATE, and COMMIT statements. Example command used:
[root@big hao]# less slow.log |grep "# Tim"|awk -F "." '{print $1}'|termsql -0 "select COL2 time,COUNT(COL2) sql_count from tbl group by COL2"
... (output omitted for brevity) ...Another aggregation showed the distribution of SQL types:
[root@big hao]# less slow.log |grep "# Time" -A 6 |egrep -i "insert|delete|update|select|commit|show" |awk '{print $1}'|sed -e 's/;/ /g' |termsql -0 "select COL0 sql_stat,count(COL0) sql_count from tbl group by COL0"
sql_stat|sql_count
COMMIT|59
DELETE|11
INSERT|321
SELECT|18
UPDATE|6
...Step 2 – Correlating with error.log flush information – The error log showed numerous flush events whose timestamps matched the slow‑log spikes, indicating disk‑write latency during InnoDB page cleaning.
[root@big hao]# less mysql-error.log |grep "2022-08-30T" |grep flush
2022-08-30T00:05:56.685140+08:00 0 [Note] [MY-011953] [InnoDB] Page cleaner took 5301ms to flush 121 and evict 0 pages
... (additional lines omitted) ...Step 3 – Deploying a custom disk‑I/O monitoring script to capture real‑time I/O metrics, confirming abnormal disk performance during the problematic periods.
//磁盘io监控脚本
#!/bin/bash
while sleep 1;
do
echo "########`date '+%F %H:%M:%S'`" >> /app/monitor/diskmoni$(date "+%Y%m%d").log
sar -d -p 1 10 >> /app/monitor/diskmoni$(date "+%Y%m%d").log
doneAdditional analysis of other timestamps (e.g., 2022‑08‑30T11:04:27) showed that when the master’s disk I/O was normal, the slowdown was linked to semi‑synchronous replication latency rather than I/O.
Step 4 – Examining semi‑sync replication strace logs – By enabling rpl_semi_sync_slave_trace_level , the author captured the delay between the master sending binlog events and receiving ACKs, which could reach several seconds due to network or slave‑disk issues.
//开启半同步strace日志
mysql> show variables like 'rpl_semi_sync_slave_trace_level';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_trace_level | 32 |
+---------------------------------+-------+
mysql> set global rpl_semi_sync_slave_trace_level = 16;
... (output omitted) ...Network ping logs confirmed stable connectivity, so the primary culprit remained disk I/O on the slave.
Summary
Commit statements dominate the slow‑log entries, followed by UPDATE and INSERT, which are I/O‑intensive.
When a batch contains simple SELECTs, the slowdown is less likely related to semi‑sync replication.
Even administrative commands like SHOW MASTER STATUS can appear slow under heavy I/O pressure.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.