Why Are Your Java Threads Stuck? Decoding WAITING and TIMED_WAITING States
This article explains Java thread run states, analyzes why threads appear in WAITING or TIMED_WAITING during concurrent tasks, shows flame‑graph diagnostics, and offers concrete code‑level optimizations to reduce CPU pressure and improve overall backend performance.
1. Thread Run States
1.1 total
1.2 timed_waiting
From the above image we can see that the top N threads in TIMED_WAITING are querying national subsidy qualifications.
1.3 waiting
From the image we can see that the top N threads in WAITING are querying national subsidy activities.
1.4 Thread Analysis
Below we analyze the two states:
1. WAITING state
Definition: When a thread is in
WAITINGstate, it waits for a specific operation from another thread (such as notification or interruption) and does not continue execution.
Trigger conditions: Common ways a thread enters
WAITINGinclude calling
Object.wait(),
Thread.join(), or
LockSupport.park().
Resumption: The thread remains in
WAITINGuntil another thread calls
notify()or
notifyAll()(for
Object.wait()) or it is interrupted.
2. TIMED_WAITING state
Definition: When a thread is in
TIMED_WAITINGstate, it waits for a condition but will automatically return after a specified time.
Trigger conditions: Typical causes are
Thread.sleep(milliseconds),
Object.wait(milliseconds),
Thread.join(milliseconds),
LockSupport.parkNanos()or
LockSupport.parkUntil().
Resumption: The thread resumes automatically after the timeout or when another thread calls
notify()or
notifyAll().
Next we combine with actual code analysis:
In the code,
queryActTpruns
getActivityInfowith two sub‑tasks, while
queryQualityTpruns
getQualityInfowith five sub‑tasks; both are executed in parallel within
queryActAndQualityTp.
Second‑level monitoring screenshots for
getActivityInfoand
getQualityInfoare shown above.
Although the call patterns are the same, the threads appear in different states; theoretically both should be
TIMED_WAITING. For
queryActTpthe stack shows
LockSupport.parkinstead of
LockSupport.parkNanos, requiring further investigation.
Another issue: thread pool A invokes pools B and C in parallel, causing CPU pressure under high load. The logic is refactored to use a single pool for activity and qualification queries; the change is postponed for now.
2. Flame Graph Analysis
2.1 wait thread
2.2 lock performance
2.3 CPU sampling
2.3.1 getFatherActivity analysis
Q1: Called in a loop; Q2: Large JSON (≈50 000 characters) deserialization; Q3: New
ArrayListcreation; Q4: Only the first element of grouped objects used – using
toMapwould be better.
Optimization 1:
We observed multiple stream calls inside the loop; moving the
toMaplogic outside the loop reduces overhead.
Other methods also consume high CPU; left unchanged for now.
Further optimization: a utility class to collect concurrent thread results.
<code>1. After allOf exception, cancel all threads to prevent wasted CPU from timeouts.</code> <code>2. Reduce excessive exception logs by filtering based on exception type.</code>All waiting in concurrent threads now uses the unified method; the previous WAITING state of
queryActTpmay be due to missing cancellation. After redeployment, further observation is needed. The TIMED_WAITING state of
queryQualityTplikely relates to longer sub‑task execution time, as shown in monitoring data.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.