How to Quickly Pinpoint High CPU Usage in Java Services on Linux
When a data platform server spikes to 98% CPU despite low traffic, this guide walks you through using Linux tools and Java thread analysis to locate the offending process, identify the problematic code, and apply a targeted fix that drops CPU load dramatically.
Problem Background
An alert showed a data‑platform server CPU utilization of 98.94% sustained above 70%, suggesting a hardware bottleneck, yet the business logic is neither high‑concurrency nor CPU‑intensive, indicating a code‑level issue.
Investigation Steps
2.1 Identify High‑Load Process (PID)
Log into the server and run top to view current load. The load‑average confirms high load on an 8‑core machine, and the process with PID 682 shows the highest CPU share.
2.2 Locate the Specific Business Service
Use pwdx 682 to reveal the executable path, which points to the data‑platform web service.
2.3 Find the Abnormal Thread and Code Line
Traditional four‑step method:
Sort processes by load: top -o %CPU to get the max‑load PID.
Show threads of that PID: top -Hp <PID>.
Convert thread ID to hex: printf "0x%x" <TID>.
Search the hex ID in a jstack dump: jstack <PID> | vim +/0x<hex> -.
Because this is time‑critical in production, the author recommends the script show-busy-java-threads.sh (from Taobao’s oldratlee) to automate the process.
The analysis reveals a time‑utility method that consumes excessive CPU.
Root Cause Analysis
The problematic method converts timestamps to formatted date strings. It is invoked repeatedly: for each query the system calculates every second from midnight to the current time, placing the results into a set that is later only used for its size.
Exception method logic: Convert a timestamp to a date‑time format.
Upper‑layer call: Compute all seconds from midnight to now, format each, and store in a set.
Logic layer: Real‑time report queries call this method many times per request, leading to millions of conversions.
At 10 am, a single query performs 10 × 60 × 60 × n calculations (36,000 × n), and the count grows linearly toward midnight, exhausting CPU.
Solution
After pinpointing the issue, the team reduced the number of calculations by simplifying the method: instead of generating a full set, compute the current second offset from midnight directly. The new implementation replaces the heavy utility call.
Post‑deployment monitoring showed CPU load dropping by a factor of 30, returning to normal levels.
Takeaways
Performance matters as much as functional correctness; efficient code is a core engineering skill.
Conduct thorough code reviews and consider alternative implementations.
Never overlook small details in production incidents; a meticulous mindset drives continuous improvement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
