How to Quickly Pinpoint High CPU Usage in Java Services: A Step‑by‑Step Guide
When a data platform server suddenly shows CPU usage above 90%, this guide walks you through using Linux tools and a custom script to identify the offending Java process, trace the problematic thread, pinpoint the exact code line, and apply a fix that reduces load dramatically.
1. Identify the High‑Load Process
Log into the affected server and run top to view overall CPU usage and load average. On an 8‑core machine, a load average above 8 indicates high load. Observe the process list and note the PID with the highest CPU share (e.g., PID 682).
2. Locate the Abnormal Business Logic
Use the PID to find the executable path with pwdx <PID>. The path reveals which service the process belongs to; in this case it is the data‑platform web service.
3. Find the Problematic Thread and Code Line
Traditional four‑step method:
Sort processes by load with top (or ps -eo pid,pcpu,cmd --sort=-pcpu) to obtain the max‑load PID.
Show thread‑level CPU usage: top -Hp <PID> and note the thread ID.
Convert the thread ID to hexadecimal: printf "0x%x" <tid>.
Run jstack <PID> | vim +/0x<hex_tid> to locate the stack frame.
Because this is time‑critical in production, the author recommends using the script show-busy-java-threads.sh (by oldratlee) which automates the above steps.
4. Root Cause Analysis
The investigation revealed that a time‑utility method, which converts timestamps to formatted date strings, was being called excessively. The method is used by the real‑time reporting module, which invokes it many times per query (n times per query). For a query at 10 AM, the method runs 10 × 60 × 60 × n = 36 000 × n times, and the count grows linearly toward midnight, exhausting CPU.
5. Solution
After confirming that the calling code only needed the size of the returned Set, the logic was simplified to compute currentSeconds - midnightSeconds directly, eliminating the heavy conversion loop. The new implementation was deployed, and CPU usage dropped by roughly 30×, returning the server to normal load.
6. Takeaways
Performance‑critical code should be reviewed for unnecessary computation.
Always verify that the results of a utility method are actually used.
In production, rapid diagnosis tools (e.g., show-busy-java-threads.sh) can save valuable time.
Continuous code review and performance testing are essential to avoid hidden CPU bottlenecks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
