How to Diagnose and Fix High CPU Usage in Java Data Platforms
This article walks through a real‑world incident where a data‑platform server showed near‑100% CPU usage, explains step‑by‑step investigation using top, pwdx, and jstack, identifies a time‑conversion utility as the root cause, and presents a streamlined script‑based solution that reduced CPU load by thirtyfold.
1. Incident Overview
Yesterday afternoon an ops alert indicated the data‑platform server CPU utilization had spiked to 98.94% and stayed above 70% for a while. The service is not high‑concurrency or CPU‑intensive, suggesting the problem lies in the business code rather than hardware.
2. Investigation Steps
2.1 Identify High‑Load Process (PID)
Log into the server and run
topto view current load. The load average and an 8‑core benchmark confirm high load, and process ID 682 shows a large CPU share.
2.2 Locate the Affected Service
Use
pwdx 682to find the executable path, revealing that the process belongs to the data‑platform web service.
2.3 Find the Problematic Thread and Code Line
A traditional four‑step method is:
1. top -o %CPU – sort processes by CPU usage to find maxLoad(pid) 2. top -Hp <PID> – list threads of that PID 3. printf "0x%x " <thread‑PID> – convert thread ID to hex for jstack lookup 4. jstack <PID> | vim +/0x<hex> – search the stack trace
Because this is time‑critical in production, the author previously packaged these steps into a helper script
show-busy-java-threads.sh, which quickly pinpoints busy Java threads.
3. Root‑Cause Analysis
The investigation traced the high CPU to a time‑utility method that converts timestamps to formatted date strings. This method is invoked repeatedly in real‑time report queries: for each second from midnight to the current time, the method runs, leading to 10 × 60 × 60 × n calls (e.g., 36,000 × n) that grow linearly throughout the day, exhausting CPU.
4. Solution
After locating the offending method, the team realized the returned
Setwas never used; only its size was needed. They replaced the heavy conversion with a simple calculation of
currentSeconds - midnightSeconds. Deploying the change reduced CPU load by about 30×, returning the server to normal operation.
5. Takeaways
Performance should be considered alongside functional correctness during development.
Regular code reviews help discover more efficient implementations.
Never overlook small details in production incidents; thorough investigation is key to reliability.
Show‑Busy‑Java‑Threads.sh Script
<code>#!/bin/bash
# @Function
# Find out the highest cpu consumed threads of java, and print the stack of these threads.
# @Usage
# $ ./show-busy-java-threads.sh
# @author Jerry Lee
readonly PROG=$(basename $0)
readonly -a COMMAND_LINE=($0 "$@")
usage(){
cat <<EOF
Usage: ${PROG}[OPTION]...
Find out the highest cpu consumed threads of java, and print the stack of these threads.
Example: ${PROG} -c 10
Options:
-p,--pid find out the highest cpu consumed threads from the specified java process, default from all java processes.
-c,--count set the thread count to show, default is 5
-h,--help display this help and exit
EOF
exit $1
}
readonly ARGS=$(getopt -n "$PROG" -a -o c:p:h -l count:,pid:,help -- "$@")
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"
while true; do
case "$1" in
-c|--count) count="$2"; shift 2;;
-p|--pid) pid="$2"; shift 2;;
-h|--help) usage 0;;
--) shift; break;;
*) break;;
esac
done
count=${count:-5}
redEcho(){
[ -c /dev/stdout ] && { echo -ne "\033[1;31m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
yellowEcho(){
[ -c /dev/stdout ] && { echo -ne "\033[1;33m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
blueEcho(){
[ -c /dev/stdout ] && { echo -ne "\033[1;36m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
# Check the existence of jstack command!
if ! which jstack &>/dev/null; then
[ -z "$JAVA_HOME" ] && { redEcho "Error: jstack not found on PATH!"; exit 1; }
[ ! -f "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack does NOT exist!"; exit 1; }
[ ! -x "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack is NOT executable!"; exit 1; }
export PATH="$JAVA_HOME/bin:$PATH"
fi
readonly uuid=`date +%s`_${RANDOM}_$$
cleanupWhenExit(){
rm -f /tmp/${uuid}_* &>/dev/null
}
trap "cleanupWhenExit" EXIT
printStackOfThreads(){
local line
local count=1
while IFS=" " read -a line; do
local pid=${line[0]}
local threadId=${line[1]}
local threadId0x="0x$(printf %x ${threadId})"
local user=${line[2]}
local pcpu=${line[4]}
local jstackFile=/tmp/${uuid}_${pid}
[ ! -f "${jstackFile}" ] && {
if [ "${user}" = "${USER}" ]; then
jstack ${pid} >${jstackFile}
else
if [ $UID -eq 0 ]; then
sudo -u ${user} jstack ${pid} >${jstackFile}
else
redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
redEcho "User of java process($user) is not current user($USER), need sudo to run again:"
yellowEcho " sudo ${COMMAND_LINE[@]}"
echo
continue
fi
fi
}
blueEcho "[$((count++))] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user}):"
sed -n "/nid=${threadId0x} /,/^$/p" ${jstackFile}
done
}
ps -Leo pid,lwp,user,comm,pcpu --no-headers |
{ [ -z "${pid}" ] && awk '$4=="java"{print $0}' || awk -v "pid=${pid}" '$1==pid && $4=="java"{print $0}'; } |
sort -k5 -r -n | head -n "${count}" | printStackOfThreads
</code>Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.