Databases 19 min read

How to Diagnose MySQL Lock Waits and Transaction Timeouts with MyAWR

This article explains how commercial banks can analyze MySQL lock‑wait events and transaction timeouts using MDL metadata lock monitoring, innodb_lock_wait_timeout, custom MyAWR collection, SQL de‑parameterization, and post‑mortem queries to pinpoint blocking SQL and its source.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose MySQL Lock Waits and Transaction Timeouts with MyAWR

1. Problem Background

Commercial banks often set transaction timeouts (e.g., 8 s or 4 s). Unoptimized SQL causing large transactions or DDL can hold record‑level or system‑level exclusive locks, blocking online transactions and triggering timeout alerts.

Although most IT operation platforms monitor MDL metadata locks and MySQL lock‑wait timeout parameters, the configured values usually exceed application‑level timeouts, leaving a gap in database‑level monitoring and post‑mortem analysis when many transaction timeouts occur.

2. Current Situation

The center currently provides the following monitoring and governance measures:

MDL metadata lock monitoring with a 15 s threshold;

MySQL

innodb_lock_wait_timeout

parameter set to 10 s;

Large‑transaction kill mechanism;

Slow‑SQL automatic kill mechanism;

The internally developed MyAWR tool can also provide minute‑level MySQL instance runtime information.

3. Problem Analysis and Action Points

Given the existing monitoring, adding extra real‑time lock‑blocking alerts is not considered necessary because:

Existing metadata lock and row‑lock timeout alerts are too short for manual intervention, and automated handling is uncertain.

Automatic handling accuracy is unclear (e.g., can the blocker be safely killed?).

Row‑lock timeout will end the waiting session, causing the application to retry, which aligns with the processing flow.

Collecting finer‑grained data at the second level may impact production performance.

The article therefore focuses on post‑mortem analysis: when many transaction timeouts occur, systematically identify the main sources of lock‑induced blocking.

In OLTP systems, many locks are generated continuously, but only locks that coincide with high concurrency, improper locking, or hot‑resource contention cause blocking. Extreme cases can lead to lock‑induced avalanches, exhausting connection pools or maxing CPU, though existing monitoring reduces this risk.

The primary goal is to use historical run records to quickly pinpoint the blocking SQL statements and their origins.

4. Analysis Method

All examples use MySQL; other databases (GaussDB, PostgreSQL) are analogous.

1. Collect Blocking Information

Based on the MyAWR tool’s lock‑wait record table:

<code>-- myawr_db_innodb_lock_waits definition
CREATE TABLE `myawr_db_innodb_lock_waits` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `snap_id` int(11) NOT NULL,
  `wait_started` varchar(19) DEFAULT NULL,
  `wait_age` varchar(8) DEFAULT NULL,
  `wait_age_secs` bigint(21) DEFAULT NULL,
  `locked_table` varchar(1024) DEFAULT NULL,
  `locked_index` varchar(1024) DEFAULT NULL,
  `locked_type` varchar(32) DEFAULT NULL,
  `waiting_trx_id` varchar(18) DEFAULT NULL,
  `waiting_trx_started` varchar(19) DEFAULT NULL,
  `waiting_trx_age` varchar(8) DEFAULT NULL,
  `waiting_trx_rows_locked` bigint(21) unsigned DEFAULT NULL,
  `waiting_trx_rows_modified` bigint(21) unsigned DEFAULT NULL,
  `waiting_pid` bigint(21) unsigned DEFAULT NULL,
  `waiting_query` varchar(1024) DEFAULT NULL,
  `waiting_lock_id` varchar(81) DEFAULT NULL,
  `waiting_lock_mode` varchar(32) DEFAULT NULL,
  `blocking_trx_id` varchar(18) DEFAULT NULL,
  `blocking_pid` bigint(21) unsigned DEFAULT NULL,
  `blocking_query` varchar(1024) DEFAULT NULL,
  `blocking_lock_id` varchar(81) DEFAULT NULL,
  `blocking_lock_mode` varchar(32) DEFAULT NULL,
  `blocking_trx_started` varchar(19) DEFAULT NULL,
  `blocking_trx_age` varchar(8) DEFAULT NULL,
  `blocking_trx_rows_locked` bigint(21) unsigned DEFAULT NULL,
  `blocking_trx_rows_modified` bigint(21) unsigned DEFAULT NULL,
  `sql_kill_blocking_query` varchar(32) DEFAULT NULL,
  `sql_kill_blocking_connection` varchar(26) DEFAULT NULL,
  `m` int(2) NOT NULL,
  PRIMARY KEY (`id`,`m`),
  KEY `idx_db_innodb_lock_waits` (`snap_id`)
) ENGINE=InnoDB AUTO_INCREMENT=26448 DEFAULT CHARSET=utf8mb4;
</code>

Sample collection query (compatible with MySQL 5.7 and 8.0):

<code>select
  DATE_FORMAT(`wait_started`,'%Y-%m-%d %H:%i:%s'),
  TIME_FORMAT(`wait_age`,'%H:%i:%s'),
  `wait_age_secs`,
  `locked_table`,
  `locked_index`,
  `locked_type`,
  `waiting_trx_id`,
  DATE_FORMAT(`waiting_trx_started`,'%Y-%m-%d %H:%i:%s'),
  TIME_FORMAT(`waiting_trx_age`,'%H:%i:%s'),
  `waiting_trx_rows_locked`,
  `waiting_trx_rows_modified`,
  `waiting_pid`,
  `waiting_query`,
  `waiting_lock_id`,
  `waiting_lock_mode`,
  `blocking_trx_id`,
  `blocking_pid`,
  IFNULL(`blocking_query`,''),
  `blocking_lock_id`,
  `blocking_lock_mode`,
  DATE_FORMAT(`blocking_trx_started`,'%Y-%m-%d %H:%i:%s'),
  TIME_FORMAT(`blocking_trx_age`,'%H:%i:%s'),
  `blocking_trx_rows_locked`,
  `blocking_trx_rows_modified`,
  `sql_kill_blocking_query`,
  `sql_kill_blocking_connection`
from sys.x$innodb_lock_waits
limit 500;
</code>

Note: The tool samples every minute, so second‑level precision may be limited; adjust the

limit

clause if many lock‑wait events occur.

2. SQL De‑parameterization

The collected

waiting_query

and

blocking_query

contain full SQL with concrete parameters. To group similar statements, replace literals with placeholders:

<code>UPDATE warehouse SET w_ytd = w_ytd + 1876.54 WHERE w_id = 1
UPDATE warehouse SET w_ytd = w_ytd + 3384.41 WHERE w_id = 3
</code>

Desired normalized form:

<code>UPDATE warehouse SET w_ytd = w_ytd + ? WHERE w_id = ?
</code>

A simple Go implementation replaces quoted strings and numeric literals with “?”:

<code>func formatSQL(sql string) string {
    sarr := strings.Split(sql, "'")
    for i := 1; i < len(sarr); i += 2 {
        sarr[i] = "?"
    }
    aa := strings.Join(sarr, "'")
    re := regexp.MustCompile(`\b\d+(\.\d+)?\b`)
    formattedSQL := re.ReplaceAllString(aa, "?")
    return formattedSQL
}
</code>

Result illustration:

3. Table‑Level Blocking Analysis

Query the lock‑wait table for a specific time window to aggregate wait time per table and blocking SQL:

<code>select
    w.locked_table,
    sum(w.wait_age_secs) as wait_sum,
    w.blocking_query
from myawr_db_innodb_lock_waits w
where w.wait_started between '2023-08-22 16:41:00' and '2023-08-22 17:00:00'
group by w.locked_table, w.blocking_query
order by wait_sum desc;
</code>

Additional dimensions can be explored using other fields in the lock‑wait detail table.

Limitations:

The collection is a sample; lower sampling rates may miss events, but captured records still indicate important issues.

The

blocking_query

field may not always be the exact statement causing the lock because locks are held per transaction, and the sampled statement could be a later one (e.g., COMMIT).

Summing

wait_age_secs

does not reflect total blocked time accurately due to sampling intervals and possible duplicate records for the same lock.

Thus, the analysis provides qualitative insights to identify the most problematic tables and SQL during a given period.

4. Correlating Blocking SQL with Application Nodes

Using MyAWR’s historical session table, match the blocking SQL prefix to the

user

field to retrieve IP or host information, helping locate the offending application node.

<code>-- mytest.myawr_db_session definition
CREATE TABLE `myawr_db_session` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `snap_id` int(11) NOT NULL,
  `thd_id` bigint(20) unsigned DEFAULT NULL,
  `conn_id` bigint(20) unsigned DEFAULT NULL,
  `user` varchar(128) DEFAULT NULL,
  `db` varchar(64) DEFAULT NULL,
  `command` varchar(16) DEFAULT NULL,
  `state` varchar(64) DEFAULT NULL,
  `time` bigint(20) DEFAULT NULL,
  `current_statement` longtext,
  `statement_latency` bigint(20) unsigned DEFAULT NULL,
  `progress` decimal(26,0) DEFAULT NULL,
  `lock_latency` bigint(20) unsigned DEFAULT NULL,
  `rows_examined` bigint(20) unsigned DEFAULT NULL,
  `rows_sent` bigint(20) unsigned DEFAULT NULL,
  `rows_affected` bigint(20) unsigned DEFAULT NULL,
  `tmp_tables` bigint(20) unsigned DEFAULT NULL,
  `tmp_disk_tables` bigint(20) unsigned DEFAULT NULL,
  `full_scan` varchar(3) DEFAULT NULL,
  `last_statement` longtext,
  `last_statement_latency` bigint(20) unsigned DEFAULT NULL,
  `current_memory` decimal(41,0) DEFAULT NULL,
  `last_wait` varchar(128) DEFAULT NULL,
  `last_wait_latency` varchar(20) DEFAULT NULL,
  `source` varchar(64) DEFAULT NULL,
  `trx_latency` bigint(20) unsigned DEFAULT NULL,
  `trx_state` varchar(20) DEFAULT NULL,
  `trx_autocommit` varchar(20) DEFAULT NULL,
  `pid` varchar(1024) DEFAULT NULL,
  `program_name` varchar(1024) DEFAULT NULL,
  `m` int(2) NOT NULL,
  PRIMARY KEY (`id`,`m`),
  KEY `idx_db_session` (`snap_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6317 DEFAULT CHARSET=utf8mb4;
</code>

5. Conclusion

The article presents a post‑mortem approach for analyzing lock‑blocking events that impact business, focusing on extracting useful insights from historical data rather than real‑time emergency handling. The same methodology can be adapted to other database platforms.

MySQLSQL OptimizationPerformance analysislock waitMyAWRtransaction timeout
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.