Backend Development 7 min read

Diagnosing and Resolving Thread Leak in a Java Backend System

This article describes how the author investigated recurring errors in a Java-based lecture hall system, identified thread pool leaks caused by improper use of AsyncHttpClient, and resolved the issue through command‑line diagnostics, code fixes, and system monitoring, improving stability.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Diagnosing and Resolving Thread Leak in a Java Backend System

1. Problem Background

The lecture hall and points management systems were stable but occasionally failed, preventing users from registering for courses. Errors occurred roughly once every half‑month and were temporarily fixed by restarting the service, suggesting a resource leak.

2. Investigation Approach

Since the original developers were unavailable, the investigation started by gathering system information and logs. Two possible leak sources were considered: object (memory) leak and thread leak.

3. Investigation Process

Step 1: Identify Java process ID

sudo -u tomcat jps -lv | grep qtscore

Step 2: Check for memory leaks

GC logs showed normal frequency and no frequent Full GC. Heap histogram was examined:

sudo -u tomcat jmap -F -histo 11035

The top memory consumers were not business code, so a memory leak was largely ruled out.

Step 3: Check for thread leaks

top -H -p 11035

The process had 4038 threads, far exceeding the default Dubbo (200) and Tomcat (200) thread pool sizes, indicating a thread leak. Stack traces were captured:

sudo -u tomcat jstack -l 11035 > /tmp/qtscore_stack.log

The stack contained many "New I/O boss" Netty threads but no business‑logic code, pointing to a Netty‑based thread‑pool leak. Log analysis revealed frequent errors from a Dubbo service interface.

Further code review found an AsyncHttpClient usage:

AsyncHttpClient creates a new Netty thread pool on each instantiation. The instance should be reused instead. The code was corrected to reuse a single AsyncHttpClient instance.

After redeploying, thread count stabilized around 350, confirming the fix. Additional log inspection showed errors caused the process to exceed the system’s maximum user threads (4096), as revealed by ulimit -a .

4. Lessons Learned

Core components like Dubbo are generally reliable; focus on custom code when issues arise.

System logs often point directly to the root cause; thorough log analysis speeds up troubleshooting.

debuggingJavaDubbonettyThread Leak
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.