Diagnosing and Resolving Extreme CPU Usage in a Java Data Platform
When a data platform server suddenly shows CPU utilization near 99% despite modest traffic, this guide walks through pinpointing the offending Java process, tracing the high‑load thread, uncovering a time‑conversion routine that over‑calculates seconds, and applying a lightweight fix that drops CPU load by dozens of times.
Incident Overview
An operations alert showed that a data‑platform server’s CPU usage spiked to 98.94% and stayed above 70% for an extended period. The application is not CPU‑intensive, so the alert suggested a software‑level problem rather than a hardware bottleneck.
Investigation Steps
2.1 Identify the High‑Load Process (PID)
Log into the host and run top. On an 8‑core machine the load average was high and process 682 showed the largest %CPU value.
2.2 Locate the Business Component
Execute pwdx 682 to reveal the working directory of the process. The path points to the data‑platform web service, confirming the offending service.
2.3 Find the Problematic Thread and Code Line
Typical manual debugging involves four steps:
Sort processes by CPU ( top -b -o %CPU ) and note the PID with the highest load. List Java threads for that PID ( top -Hp <PID> ) and note the thread ID with the highest %CPU . Convert the thread ID to hexadecimal ( printf "0x%x" <tid> ) because jstack reports thread IDs in hex. Run jstack <PID> and search for the hex thread ID to obtain the stack trace.
To avoid the repetitive manual work, the show-busy-java-threads.sh script automates these steps: it extracts the top‑CPU Java threads, generates a temporary jstack dump (using sudo when necessary), and prints the stack for each busy thread.
Root‑Cause Analysis
The stack traces pointed to a utility method that converts a timestamp to a formatted date string. The method was invoked in a loop that enumerates every second from midnight to the current time, stores each formatted string in a Set, and then only uses the set’s size(). In a real‑time reporting query that runs many times per minute, the method is called 10 × 60 × 60 × n times (where n is the number of calls per query). As the day progresses, the number of calls grows linearly, causing the CPU consumption to increase dramatically.
Solution
The unnecessary formatting was removed. Instead of converting each second to a string, the code now computes the difference between the current epoch seconds and midnight’s epoch seconds and uses that integer directly. The new implementation replaces the expensive formatting call, eliminates the creation of the intermediate Set, and returns the integer value. After redeployment, CPU usage dropped by roughly a factor of 30 and returned to normal levels.
Script: show-busy-java-threads.sh
#!/bin/bash
# Find the highest‑CPU Java threads and print their stack traces.
readonly PROG=$(basename $0)
readonly -a COMMAND_LINE=("$0" "$@")
usage(){
cat <<EOF
Usage: ${PROG} [OPTION]...
Options:
-p,--pid Specify a Java PID (default: all Java processes)
-c,--count Number of top threads to display (default 5)
-h,--help Show this help message
EOF
exit $1
}
ARGS=$(getopt -n "${PROG}" -a -o c:p:h -l count:,pid:,help -- "$@")
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"
while true; do
case "$1" in
-c|--count) count="$2"; shift 2;;
-p|--pid) pid="$2"; shift 2;;
-h|--help) usage 0;;
--) shift; break;;
*) break;;
esac
done
count=${count:-5}
redEcho(){ [ -c /dev/stdout ] && echo -e "\033[1;31m$@\033[0m" || echo "$@"; }
yellowEcho(){ [ -c /dev/stdout ] && echo -e "\033[1;33m$@\033[0m" || echo "$@"; }
blueEcho(){ [ -c /dev/stdout ] && echo -e "\033[1;36m$@\033[0m" || echo "$@"; }
# Ensure jstack is available
if ! which jstack >/dev/null; then
[ -z "$JAVA_HOME" ] && { redEcho "Error: jstack not found on PATH!"; exit 1; }
[ ! -x "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: $JAVA_HOME/bin/jstack not executable!"; exit 1; }
export PATH="$JAVA_HOME/bin:$PATH"
fi
readonly uuid=$(date +%s)_${RANDOM}_$$
cleanupWhenExit(){ rm -f /tmp/${uuid}_* >/dev/null 2>&1; }
trap "cleanupWhenExit" EXIT
printStackOfThreads(){
while IFS=" " read -a line; do
pid=${line[0]}
threadId=${line[1]}
threadId0x="0x$(printf %x $threadId)"
user=${line[2]}
pcpu=${line[4]}
jstackFile=/tmp/${uuid}_${pid}
if [ ! -f "$jstackFile" ]; then
if [ "$user" = "$USER" ]; then
jstack $pid > "$jstackFile"
else
if [ "$UID" -eq 0 ]; then
sudo -u $user jstack $pid > "$jstackFile"
else
redEcho "[${count}] Fail to jstack busy thread (${pcpu}%) under user $user."
yellowEcho " sudo ${COMMAND_LINE[@]}"
continue
fi
fi
fi
blueEcho "[${count}] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process($pid) under user($user):"
sed -n "/nid=${threadId0x} /,/^$/p" "$jstackFile"
done
}
ps -Leo pid,lwp,user,comm,pcpu --no-headers |
awk '$4=="java"{print}' |
sort -k5 -r -n |
head -n ${count} |
printStackOfThreadsKey Takeaways
Validate CPU spikes with system tools ( top, pwdx) before assuming hardware limits.
Identify the offending process and then drill down to the specific Java thread using jstack or the provided show-busy-java-threads.sh script.
Automating thread‑stack extraction reduces mean‑time‑to‑recovery for production incidents.
Review business logic for unnecessary heavy computations; replace expensive formatting with simple arithmetic when possible.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
