Diagnosing MySQL Replication Lag Caused by Page Compression Using perf and pstack
The article explains how a MySQL 5.6 master‑slave setup experienced growing replication lag on an off‑site replica, how the issue was traced to page‑compression on a large table using show slave status, perf and pstack, and how disabling compression resolved the delay.
Background: An online core MySQL 5.6 instance runs with one master, two local slaves and an off‑site slave. Starting on February 14 the off‑site replica began reporting increasing replication lag, initially assumed to be network fluctuation.
Diagnosis: After logging into the remote slave, the first step is to verify whether the IO replication thread is delayed by checking show slave status and confirming that Master_Log_File matches the current binlog on the master. If it matches, the delay originates from the SQL thread.
Using performance analysis tools, the process ID of the MySQL daemon (e.g., 11029) is captured and the following command is run: perf record -ag -p 11029 -- sleep 10; perf report Repeated runs consistently show deflate_slow consuming the highest percentage of CPU time.
Further investigation with pstack 11029 repeatedly reveals stack traces related to page compression. Screenshots (omitted) confirm that the remote replica has a large table with page compression enabled and its row format set to dynamic .
After disabling page compression on the large table, the Seconds_Behind_Master metric gradually decreases, indicating that the fix is effective. A second round of perf and pstack shows that the compression‑related API calls have disappeared, confirming the direct link between the compression feature and the replication lag.
Conclusion: Leveraging perf and pstack allowed rapid identification that page compression on a large table caused SQL thread replication delay; decompressing the table eliminated the problem.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.