Analysis and Fix of Tomcat 9.0.26 Deadlock Issue
Tomcat 9.0.26 suffers a high‑concurrency deadlock caused by a lock‑order inversion among NIO poller and executor threads, dropping TPS to zero and creating thousands of CLOSE_WAIT sockets; downgrading to Tomcat 8 or applying the 9.0.31+ patch that moves the close operation into a finally block restores performance to around 15 K TPS.
Tomcat 9.0.26 experiences a thread deadlock under high concurrency, causing TPS to drop from 10K to 0 after three minutes of pressure testing on the /get.do interface.
The issue manifests as a large number of TCP CLOSE_WAIT states (200–20K) on the tested server.
Using jstack, a Java-level deadlock is found involving threads such as “http-nio-8080-exec-409”, “http-nio-8080-ClientPoller”, “http-nio-8080-exec-205”, “http-nio-8080-BlockPoller”, and “http-nio-8080-exec-380”.
Found one Java-level deadlock:
=============================
"http-nio-8080-exec-409":
waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet),
which is held by "http-nio-8080-ClientPoller"
"http-nio-8080-ClientPoller":
waiting to lock monitor 0x00007f05e8061058 (object 0x00000007bfe40a70, a java.lang.Object),
which is held by "http-nio-8080-exec-205"
"http-nio-8080-exec-205":
waiting to lock monitor 0x00007f0614018448 (object 0x00000006c0e8e088, a java.util.HashSet),
which is held by "http-nio-8080-BlockPoller"
"http-nio-8080-BlockPoller":
waiting to lock monitor 0x0000000001ed06e8 (object 0x00000007bfe110f8, a java.lang.Object),
which is held by "http-nio-8080-exec-380"
"http-nio-8080-exec-380":
waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet),
which is held by "http-nio-8080-ClientPoller"A quick fix was to downgrade Tomcat from 9.0.26 to 8.0, which eliminated the deadlock in subsequent pressure tests.
Further investigation involved submitting a bug report to the Apache community. Analysis of the stack trace revealed that three types of threads (Poller, exec, BlockPoller) were involved in a lock‑order inversion: Poller.run and Poller.cancelledKey accessed monitors in inconsistent order, causing the deadlock.
Communication with Tomcat developer Remy Maucherat confirmed the issue and led to a fix moving the close operation in Poller.cancelledKey into a finally block, ensuring Poller.run acquires the lock first.
The fix was verified using the commit https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf. After rebuilding tomcat-embed-core.jar with the patch and retesting, TPS stabilized around 15K.
The fix will be included in Tomcat 9.0.31+. As of now, the latest release is 9.0.30, so users are advised to either wait for 9.0.31+ or use Tomcat 8.
References: OpenJDK source, Tomcat source, Aliyun community article, Jianshu deep dive on Tomcat NIO model.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
