Why MySQL Transactions Hang After Network Failure and How to Kill Stuck Sessions
This article explains how MySQL transactions behave when a client’s network connection is abruptly lost, why the server does not automatically abort the transaction, and provides practical scripts and configuration tips to detect and kill long‑running idle transactions.
1. Introduction
When operating MySQL, various situations such as forced application shutdown, client machine crash, network jitter, or data‑center level outage can cause abnormal session termination between the application and MySQL.
假如强制关闭应用
假如client机器突然崩溃宕机/断电
假如网络发生抖动/网卡发生故障
机房级别断网The question is how the transaction currently executing on MySQL will behave.
2. Practice
A test case is designed where a client executes a transaction on MySQL and then the client machine is abruptly powered off, simulating a network break.
client 192.168.56.102
MySQL 192.168.56.101Client connects to the DB and performs SELECT FOR UPDATE (or UPDATE/DELETE) to lock a record.
After shutting down the client to simulate power loss, the server’s network layer still shows the connection as ESTABLISHED and the transaction remains in a running state.
Opening another session to lock the same table shows it must wait, confirming that the aborted transaction is still active.
3. Analysis
3.1 Why the server does not exit the transaction
MySQL’s ordinary session connections lack a keep‑alive mechanism; no socket options or heartbeat are set. Consequently, when the network disconnects, the server cannot promptly detect the failure. The TCP state remains ESTABLISHED because neither side sends a FIN packet.
3.2 How MySQL handles a transaction after network loss
Transaction still executing
If the SQL statement is still running when the network drops, MySQL rolls back the statement after it finishes because the response cannot be delivered to the client.
if (thd->is_error() || (thd->variables.option_bits & OPTION_MASTER_SQL_ERROR))
trans_rollback_stmt(thd);Transaction completed but not committed
When the statement finishes but the transaction is not committed, the transaction remains on the server and must be killed manually.
socket->listen->poll(socketfd)->accept->newthread->poll(newfd,wait_timeout)
The session stays alive for wait_timeout (default 8 hours) or until the TCP keep‑alive timeout expires.
/proc/sys/net/ipv4/tcp_keepalive_time = 7200 # seconds idle before probes
/proc/sys/net/ipv4/tcp_keepalive_intvl = 75 # interval between probes
/proc/sys/net/ipv4/tcp_keepalive_probes= 9 # number of probes3.3 When does the transaction finally exit?
If the connection is idle, MySQL will close it after wait_timeout (default 8 h). Otherwise, TCP keep‑alive probes eventually cause the OS to close the socket.
4. Actively Killing Stuck Sessions
In a data‑center level outage we observed many lock waits after reconnection, so we created a script that periodically kills long‑running idle transactions.
# Targeting operations layer
Kill_SQL="select concat('kill ',trx_mysql_thread_id,' ;') from information_schema.processlist a,information_schema.INNODB_TRX b where a.ID=b.trx_mysql_thread_id and trx_started <= SUBDATE(now(),interval 60 second) and STATE='' and COMMAND='Sleep' and TIME>=60"
LOGFILE='/data/logs/zandb_agent/kill_long_trx.log'
run_user=`whoami`
ret_log(){
msg="$1"
echo `date +%Y%m%d_%H:%M:%S` "[info] $msg" >> ${LOGFILE}
}
if [ $# -lt 1 ]; then
ports=`ps -ef | grep mysqld | grep -v mysqld_safe | grep port= | awk -F"port=" '{print $NF}' | awk '{print $1}' | sort`
else
ports=$1
fi
for port in ${ports}; do
if [ -f /tmp/long_trx_${port}.lock ]; then
msg="there is a lock file, so we skip instance ${port}.."
ret_log "$msg"
continue
fi
if [ -S /srv/my${port}/run/mysql.sock ]; then
SOCKET="/srv/my${port}/run/mysql.sock"
else
msg="socket file does not exist, please check."
ret_log "$msg"
continue
fi
MYSQL="/opt/mysql/bin/mysql -uroot -S ${SOCKET}"
${MYSQL} --skip-column-names -e "$Kill_SQL" > /tmp/kill_trx_${port}.sql.${run_user} 2>/tmp/kill_trx_${port}.log.${run_user}
num=`grep kill -c /tmp/kill_trx_${port}.sql.${run_user}`
if [[ ${num} -gt 0 ]]; then
msg="${port} ${num} long trx was killed"
ret_log "$msg"
${MYSQL} -e "source /tmp/kill_trx_${port}.sql.${run_user}"
fi
done5. Stack Traces
Rollback stack (Stack 1):
(gdb) bt
#0 innobase_rollback (hton=0x2e12440, thd=0x7ffefc000950, rollback_trx=false) at ha_innodb.cc:5452
#1 ha_rollback_low (thd=0x7ffefc000950, all=false) at handler.cc:2019
#2 MYSQL_BIN_LOG::rollback (this=0x2d668a0 <mysql_bin_log>, thd=0x7ffefc000950, all=false) at binlog.cc:2532
#3 ha_rollback_trans (thd=0x7ffefc000950, all=false) at handler.cc:2106
#4 trans_rollback_stmt (thd=0x7ffefc000950) at transaction.cc:515
#5 mysql_execute_command (thd=0x7ffefc000950, first_level=true) at sql_parse.cc:5325
#6 mysql_parse (thd=0x7ffefc000950, parser_state=0x7fffe88824a0, update_userstat=false) at sql_parse.cc:5927
#7 dispatch_command (thd=0x7ffefc000950, com_data=0x7fffe8882c90, command=COM_QUERY) at sql_parse.cc:1539
#8 do_command (thd=0x7ffefc000950) at sql_parse.cc:1060
#9 handle_connection (arg=0x3c09eb0) at connection_handler_per_thread.cc:325
#10 pfs_spawn_thread (arg=0x3b784b0) at pfs.cc:2198
#11 start_thread () from libpthread.so.0
#12 clone () from libc.so.6Write‑wait timeout (Stack 2):
#0 poll () from libc.so.6
#1 vio_io_wait (vio=0x7ffefc005690, event=VIO_IO_EVENT_WRITE, timeout=60000) at viosocket.c:1173
#2 vio_socket_io_wait (vio=0x7ffefc005690, event=VIO_IO_EVENT_WRITE) at viosocket.c:127
#3 vio_write (vio=0x7ffefc005690, buf=0x7ffefc013838 "\001\061\001\060\004", size=12872) at viosocket.c:260
#4 net_write_raw_loop (net=0x7ffefc002528, buf=0x7ffefc013838 "\001\061\001\060\004", count=12872) at net_serv.cc:522
#5 net_write_packet (net=0x7ffefc002528, packet=0x7ffefc012a80 "\001\061\001\060\004", length=16384) at net_serv.cc:661
#6 net_write_buff (net=0x7ffefc002528, packet=0x7ffefc936f70 "\001\061\001\060", len=4) at net_serv.cc:474
#7 my_net_write (net=0x7ffefc002528, packet=0x7ffefc936f70 "\001\061\001\060", len=4) at net_serv.cc:347
#8 Protocol_classic::end_row (this=0x7ffefc001c68) at protocol_classic.cc:1204
#9 Query_result_send::send_data (this=0x7ffefc0097a0, items=...) at sql_class.cc:2937
#10 end_send (join=0x7ffefc009a70, qep_tab=0x7ffefc93b290, end_of_records=false) at sql_executor.cc:2946
#11 evaluate_join_record (join=0x7ffefc009a70, qep_tab=0x7ffefc93b118) at sql_executor.cc:1652
#12 sub_select (join=0x7ffefc009a70, qep_tab=0x7ffefc93b118, end_of_records=false) at sql_executor.cc:1304
#13 do_select (join=0x7ffefc009a70) at sql_executor.cc:957
#14 JOIN::exec (this=0x7ffefc009a70) at sql_executor.cc:206
#15 handle_query (thd=0x7ffefc000950, lex=0x7ffefc003000, result=0x7ffefc0097a0) at sql_select.cc:192
#16 execute_sqlcom_select (thd=0x7ffefc000950, all_tables=0x7ffefc009160) at sql_parse.cc:5490
#17 mysql_execute_command (thd=0x7ffefc000950, first_level=true) at sql_parse.cc:3016
#18 mysql_parse (thd=0x7ffefc000950, parser_state=0x7fffe88824a0, update_userstat=false) at sql_parse.cc:5927
#19 dispatch_command (thd=0x7ffefc000950, com_data=0x7fffe8882c90, command=COM_QUERY) at sql_parse.cc:1539
#20 do_command (thd=0x7ffefc000950) at sql_parse.cc:1060
#21 handle_connection (arg=0x3c09eb0) at connection_handler_per_thread.cc:325
#22 pfs_spawn_thread (arg=0x3b784b0) at pfs.cc:2198
#23 start_thread () from libpthread.so.0
#24 clone () from libc.so.6In summary, MySQL does not automatically abort a transaction when the client disappears because the TCP connection stays ESTABLISHED without a FIN packet; the server relies on timeout parameters ( wait_timeout, TCP keep‑alive) or manual intervention to clean up.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
