Why Did My Java ThreadPool Hang? Uncovering a Hidden HashSet Loop in JDK 1.7
A production Java service suffered a thread‑pool alarm due to an unbounded LinkedBlockingQueue and a non‑thread‑safe HashSet, leading to massive memory usage, a circular linked‑list loop in HashMap, and severe CPU load, which was resolved by switching to ConcurrentHashMap and tuning the pool.
Background
In the morning I received an alert from the monitoring center that a thread‑pool queue had reached its threshold, causing a service outage. After a quick restart the application recovered, and I began investigating the root cause.
Analysis
The service pulls data from an MQ, then hands it off to a business thread pool for processing. The alarm‑triggering queue is the thread‑pool's task queue.
ThreadPoolExecutor executor = new ThreadPoolExecutor(coreSize, maxSize, 0L,
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
put(poolName, executor);The default LinkedBlockingQueue has an unbounded size (Integer.MAX_VALUE), which became a problem.
Memory Analysis
Using MAT, the heap dump showed two large objects: the LinkedBlockingQueue and a HashSet. The queue consumed most of the memory.
Because the queue was huge, the thread‑pool tasks were processed slowly, causing the queue to grow continuously.
Thread Analysis
Thread dumps (via fastthread.io) revealed many threads stuck in RUNNABLE state, all executing code that queried the HashSet for key existence.
The stack trace pointed to a line in HashMap (JDK 1.7) where a circular linked list could cause an infinite loop.
Root Cause Identification
The HashSet used as a shared resource had no synchronization, making it thread‑unsafe. Concurrent writes caused frequent rehashing; under JDK 1.7 this could create a circular linked list, leading to an endless traversal.
Additionally, each task performed a database query. With ~70 million rows and high MySQL I/O pressure, the query latency further slowed task execution, allowing the queue to grow.
Each message required a DB lookup; the heavy I/O combined with the infinite loop caused the thread pool to stall and the queue to overflow.
Summary and Recommendations
Replace the non‑thread‑safe HashSet with a ConcurrentHashMap (using dummy values to emulate a set).
Initialize the concurrent map with a large capacity to avoid frequent resizing.
Clean up obsolete data in MySQL, apply hot‑cold segregation, and consider sharding.
Cache lookups instead of hitting the DB for every message.
Give thread‑pool names meaningful identifiers.
Set an explicit queue size and a rejection policy for the thread pool.
Upgrade the runtime to JDK 1.8 or newer.
Consider phone notifications for critical alerts.
The issue demonstrates how an unbounded queue and an unsynchronized set can combine to produce a rare dead‑loop scenario in JDK 1.7, confirming Murphy’s law in production systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
