Databases 15 min read

Why MySQL Transactions Hang After Network Failure and How to Kill Stuck Sessions

This article explains how MySQL transactions behave when a client’s network connection is abruptly lost, why the server does not automatically abort the transaction, and provides practical scripts and configuration tips to detect and kill long‑running idle transactions.

Programmer DD
Programmer DD
Programmer DD
Why MySQL Transactions Hang After Network Failure and How to Kill Stuck Sessions

1. Introduction

When operating MySQL, various situations such as forced application shutdown, client machine crash, network jitter, or data‑center level outage can cause abnormal session termination between the application and MySQL.

假如强制关闭应用
假如client机器突然崩溃宕机/断电
假如网络发生抖动/网卡发生故障
机房级别断网

The question is how the transaction currently executing on MySQL will behave.

2. Practice

A test case is designed where a client executes a transaction on MySQL and then the client machine is abruptly powered off, simulating a network break.

client 192.168.56.102  
MySQL  192.168.56.101

Client connects to the DB and performs SELECT FOR UPDATE (or UPDATE/DELETE) to lock a record.

After shutting down the client to simulate power loss, the server’s network layer still shows the connection as ESTABLISHED and the transaction remains in a running state.

Opening another session to lock the same table shows it must wait, confirming that the aborted transaction is still active.

3. Analysis

3.1 Why the server does not exit the transaction

MySQL’s ordinary session connections lack a keep‑alive mechanism; no socket options or heartbeat are set. Consequently, when the network disconnects, the server cannot promptly detect the failure. The TCP state remains ESTABLISHED because neither side sends a FIN packet.

3.2 How MySQL handles a transaction after network loss

Transaction still executing

If the SQL statement is still running when the network drops, MySQL rolls back the statement after it finishes because the response cannot be delivered to the client.

if (thd->is_error() || (thd->variables.option_bits & OPTION_MASTER_SQL_ERROR))
    trans_rollback_stmt(thd);

Transaction completed but not committed

When the statement finishes but the transaction is not committed, the transaction remains on the server and must be killed manually.

socket->listen->poll(socketfd)->accept->newthread->poll(newfd,wait_timeout)

The session stays alive for wait_timeout (default 8 hours) or until the TCP keep‑alive timeout expires.

/proc/sys/net/ipv4/tcp_keepalive_time   = 7200   # seconds idle before probes
/proc/sys/net/ipv4/tcp_keepalive_intvl = 75     # interval between probes
/proc/sys/net/ipv4/tcp_keepalive_probes= 9      # number of probes

3.3 When does the transaction finally exit?

If the connection is idle, MySQL will close it after wait_timeout (default 8 h). Otherwise, TCP keep‑alive probes eventually cause the OS to close the socket.

4. Actively Killing Stuck Sessions

In a data‑center level outage we observed many lock waits after reconnection, so we created a script that periodically kills long‑running idle transactions.

# Targeting operations layer
Kill_SQL="select concat('kill ',trx_mysql_thread_id,' ;') from information_schema.processlist a,information_schema.INNODB_TRX b where a.ID=b.trx_mysql_thread_id and trx_started <= SUBDATE(now(),interval 60 second) and STATE='' and COMMAND='Sleep' and TIME>=60"
LOGFILE='/data/logs/zandb_agent/kill_long_trx.log'
run_user=`whoami`

ret_log(){
  msg="$1"
  echo `date +%Y%m%d_%H:%M:%S` "[info] $msg" >> ${LOGFILE}
}

if [ $# -lt 1 ]; then
  ports=`ps -ef | grep mysqld | grep -v mysqld_safe | grep port= | awk -F"port=" '{print $NF}' | awk '{print $1}' | sort`
else
  ports=$1
fi

for port in ${ports}; do
  if [ -f /tmp/long_trx_${port}.lock ]; then
    msg="there is a lock file, so we skip instance ${port}.."
    ret_log "$msg"
    continue
  fi
  if [ -S /srv/my${port}/run/mysql.sock ]; then
    SOCKET="/srv/my${port}/run/mysql.sock"
  else
    msg="socket file does not exist, please check."
    ret_log "$msg"
    continue
  fi
  MYSQL="/opt/mysql/bin/mysql -uroot -S ${SOCKET}"
  ${MYSQL} --skip-column-names -e "$Kill_SQL" > /tmp/kill_trx_${port}.sql.${run_user} 2>/tmp/kill_trx_${port}.log.${run_user}
  num=`grep kill -c /tmp/kill_trx_${port}.sql.${run_user}`
  if [[ ${num} -gt 0 ]]; then
    msg="${port}  ${num} long trx was killed"
    ret_log "$msg"
    ${MYSQL} -e "source /tmp/kill_trx_${port}.sql.${run_user}"
  fi
done

5. Stack Traces

Rollback stack (Stack 1):

(gdb) bt
#0  innobase_rollback (hton=0x2e12440, thd=0x7ffefc000950, rollback_trx=false) at ha_innodb.cc:5452
#1  ha_rollback_low (thd=0x7ffefc000950, all=false) at handler.cc:2019
#2  MYSQL_BIN_LOG::rollback (this=0x2d668a0 <mysql_bin_log>, thd=0x7ffefc000950, all=false) at binlog.cc:2532
#3  ha_rollback_trans (thd=0x7ffefc000950, all=false) at handler.cc:2106
#4  trans_rollback_stmt (thd=0x7ffefc000950) at transaction.cc:515
#5  mysql_execute_command (thd=0x7ffefc000950, first_level=true) at sql_parse.cc:5325
#6  mysql_parse (thd=0x7ffefc000950, parser_state=0x7fffe88824a0, update_userstat=false) at sql_parse.cc:5927
#7  dispatch_command (thd=0x7ffefc000950, com_data=0x7fffe8882c90, command=COM_QUERY) at sql_parse.cc:1539
#8  do_command (thd=0x7ffefc000950) at sql_parse.cc:1060
#9  handle_connection (arg=0x3c09eb0) at connection_handler_per_thread.cc:325
#10 pfs_spawn_thread (arg=0x3b784b0) at pfs.cc:2198
#11 start_thread () from libpthread.so.0
#12 clone () from libc.so.6

Write‑wait timeout (Stack 2):

#0  poll () from libc.so.6
#1  vio_io_wait (vio=0x7ffefc005690, event=VIO_IO_EVENT_WRITE, timeout=60000) at viosocket.c:1173
#2  vio_socket_io_wait (vio=0x7ffefc005690, event=VIO_IO_EVENT_WRITE) at viosocket.c:127
#3  vio_write (vio=0x7ffefc005690, buf=0x7ffefc013838 "\001\061\001\060\004", size=12872) at viosocket.c:260
#4  net_write_raw_loop (net=0x7ffefc002528, buf=0x7ffefc013838 "\001\061\001\060\004", count=12872) at net_serv.cc:522
#5  net_write_packet (net=0x7ffefc002528, packet=0x7ffefc012a80 "\001\061\001\060\004", length=16384) at net_serv.cc:661
#6  net_write_buff (net=0x7ffefc002528, packet=0x7ffefc936f70 "\001\061\001\060", len=4) at net_serv.cc:474
#7  my_net_write (net=0x7ffefc002528, packet=0x7ffefc936f70 "\001\061\001\060", len=4) at net_serv.cc:347
#8  Protocol_classic::end_row (this=0x7ffefc001c68) at protocol_classic.cc:1204
#9  Query_result_send::send_data (this=0x7ffefc0097a0, items=...) at sql_class.cc:2937
#10 end_send (join=0x7ffefc009a70, qep_tab=0x7ffefc93b290, end_of_records=false) at sql_executor.cc:2946
#11 evaluate_join_record (join=0x7ffefc009a70, qep_tab=0x7ffefc93b118) at sql_executor.cc:1652
#12 sub_select (join=0x7ffefc009a70, qep_tab=0x7ffefc93b118, end_of_records=false) at sql_executor.cc:1304
#13 do_select (join=0x7ffefc009a70) at sql_executor.cc:957
#14 JOIN::exec (this=0x7ffefc009a70) at sql_executor.cc:206
#15 handle_query (thd=0x7ffefc000950, lex=0x7ffefc003000, result=0x7ffefc0097a0) at sql_select.cc:192
#16 execute_sqlcom_select (thd=0x7ffefc000950, all_tables=0x7ffefc009160) at sql_parse.cc:5490
#17 mysql_execute_command (thd=0x7ffefc000950, first_level=true) at sql_parse.cc:3016
#18 mysql_parse (thd=0x7ffefc000950, parser_state=0x7fffe88824a0, update_userstat=false) at sql_parse.cc:5927
#19 dispatch_command (thd=0x7ffefc000950, com_data=0x7fffe8882c90, command=COM_QUERY) at sql_parse.cc:1539
#20 do_command (thd=0x7ffefc000950) at sql_parse.cc:1060
#21 handle_connection (arg=0x3c09eb0) at connection_handler_per_thread.cc:325
#22 pfs_spawn_thread (arg=0x3b784b0) at pfs.cc:2198
#23 start_thread () from libpthread.so.0
#24 clone () from libc.so.6

In summary, MySQL does not automatically abort a transaction when the client disappears because the TCP connection stays ESTABLISHED without a FIN packet; the server relies on timeout parameters ( wait_timeout, TCP keep‑alive) or manual intervention to clean up.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

transactionMySQLTimeoutDatabase Operationswait_timeoutkill scriptnetwork failure
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.