Operations 7 min read

Why Is My Server CPU at 99%? Pinpoint Java Thread Bottlenecks Fast

After an alert showed a data platform server’s CPU usage soaring to 98.94%, this article walks through a systematic investigation—from spotting the high‑load process with top, tracing the offending Java thread using pwdx and jstack, to optimizing the time‑conversion utility that caused the overload.

Open Source Linux
Open Source Linux
Open Source Linux
Why Is My Server CPU at 99%? Pinpoint Java Thread Bottlenecks Fast

Problem Background

Yesterday afternoon an operations alert indicated that a data‑platform server’s CPU utilization had reached 98.94% and had been staying above 70% for a while. The service is not high‑concurrency or CPU‑intensive, so the spike suggested a code‑level issue rather than a hardware bottleneck.

Investigation Steps

1.1 Identify High‑Load Process

Log into the server and run top to view current load. The load‑average and an 8‑core benchmark confirmed high load. Observing the process list revealed that PID 682 was consuming a large share of CPU.

top output
top output
load average
load average

1.2 Locate the Faulty Service

Use pwdx with the PID to find the process’s working directory, which points to the responsible team and project.

pwdx output
pwdx output

The process corresponds to the data‑platform web service.

1.3 Find the Problematic Thread and Code Line

The traditional four‑step method is:

1. top oder by with P:1040 // first sort by load to find maxLoad(pid)
2. top -Hp <pid>:1073    // find the thread PID with high load
3. printf "0x%x
" <threadPID>:0x431  // convert thread PID to hex for jstack lookup
4. jstack <processPID> | vim +/0x431- // view stack trace at the hex thread ID

Because online incident response needs speed, the author created a helper script show-busy-java-threads.sh to automate these steps.

show-busy-java-threads output
show-busy-java-threads output

Root Cause Analysis

The investigation pinpointed a time‑utility method that converted timestamps to formatted date strings. The method was called repeatedly: for each second from midnight to the current time, the result was added to a set, and the set’s size was used later. This logic ran millions of times per query, especially as the day progressed, causing excessive CPU consumption.

The method transforms a timestamp into a formatted date string; the caller merely needs the count of elements, not the full set.

This logic resides in the real‑time reporting query of the data platform, which invokes the method many times per request.

Solution

After locating the issue, the team reduced the number of calculations by simplifying the method: instead of building a set, compute the difference between the current second and midnight directly. The new implementation replaced the old calls, cutting CPU usage by a factor of 30 and returning the server to normal load.

CPU usage before and after
CPU usage before and after

Conclusion

Beyond functional correctness, code performance matters; writing efficient, elegant solutions is a core engineering skill. Regular code reviews and continuous questioning of implementation choices help catch such inefficiencies early. Never overlook small details in production—attention to detail drives growth and excellence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingJavaOperationsCPU
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.