How to Quickly Diagnose and Fix 100% CPU Usage on Linux Servers
When a Linux server's CPU spikes to 100%, this guide walks you through a systematic investigation—from identifying the high‑load process and pinpointing the offending Java thread to applying a streamlined shell script—so you can resolve the issue and restore normal performance.
1. Problem Overview
During a routine operation an alert reported that a data‑platform server’s CPU usage had risen to 98.94% and stayed above 70% for an extended period. Although the service is not CPU‑intensive, the unusually high utilization suggested a code‑level problem rather than a hardware bottleneck.
2. Investigation Steps
2.1 Locate High‑Load Process (PID)
Log into the server and run top to view the current load. By examining the load average (8‑core benchmark) and sorting processes by CPU usage, the process with PID 682 was found to consume a large share of CPU.
2.2 Identify the Business Component
Use pwdx 682 to retrieve the working directory of the process, which reveals that the high‑load process belongs to the data‑platform web service.
2.3 Locate the Problematic Thread and Code Line
The traditional four‑step method involves:
Sorting threads by CPU usage with top -Hp <PID> to obtain the thread ID.
Converting the thread ID to hexadecimal using printf "0x%x" <TID>.
Running jstack <PID> and searching for the hex thread ID.
Because this process is time‑consuming, the show-busy-java-threads.sh script (provided below) automates these steps, quickly revealing the busy Java threads.
3. Root Cause Analysis
The investigation traced the high CPU consumption to a time‑utility method that converts timestamps to formatted dates. This method is invoked repeatedly by the real‑time reporting logic, calculating the number of seconds from midnight to the current time for each query. As the day progresses, the number of calculations grows linearly, leading to massive CPU usage.
Faulty method logic: Converts a timestamp to a date‑time string.
Upper‑level call: Computes seconds for every second of the day and stores results in a set.
Logic layer: Real‑time report queries repeatedly call the method, causing thousands of executions per query.
4. Solution
After identifying the method, the code was simplified to compute only the difference between the current second and midnight, eliminating the unnecessary set construction. The revised implementation reduced the per‑query computation dramatically; after deployment, CPU load dropped by a factor of 30, returning to normal levels.
5. Takeaways
Performance matters as much as functional correctness; efficient implementations are a core engineering skill.
Conduct thorough code reviews and consider alternative, more optimal solutions.
Never overlook small details in production incidents; meticulous investigation leads to faster resolution and continuous improvement.
6. Provided Script: show-busy-java-threads.sh
#!/bin/bash
# @Function
# Find out the highest cpu consumed threads of java, and print the stack of these threads.
# @Usage
# $ ./show-busy-java-threads.sh
# @author Jerry Lee
readonly PROG=`basename $0`
readonly -a COMMAND_LINE=("$0" "$@")
usage() {
cat <<EOF
Usage: ${PROG} [OPTION]...
Find out the highest cpu consumed threads of java, and print the stack of these threads.
Example: ${PROG} -c 10
Options:
-p, --pid find out the highest cpu consumed threads from the specifed java process,
default from all java process.
-c, --count set the thread count to show, default is 5
-h, --help display this help and exit
EOF
exit $1
}
readonly ARGS=`getopt -n "${PROG}" -a -o c:p:h -l count:,pid:,help -- "$@"`
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"
while true; do
case "$1" in
-c|--count)
count="$2"
shift 2
;;
-p|--pid)
pid="$2"
shift 2
;;
-h|--help)
usage
;;
--)
shift
break
;;
esac
done
count=${count:-5}
redEcho() {
[ -c /dev/stdout ] && { echo -ne "\033[1;31m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
yellowEcho() {
[ -c /dev/stdout ] && { echo -ne "\033[1;33m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
blueEcho() {
[ -c /dev/stdout ] && { echo -ne "\033[1;36m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}
# Check the existence of jstack command!
if ! which jstack &>/dev/null; then
[ -z "$JAVA_HOME" ] && { redEcho "Error: jstack not found on PATH!"; exit 1; }
! [ -f "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack file does NOT exists!"; exit 1; }
! [ -x "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack is NOT executalbe!"; exit 1; }
export PATH="$JAVA_HOME/bin:$PATH"
fi
readonly uuid=`date +%s`_${RANDOM}_$$
cleanupWhenExit() {
rm /tmp/${uuid}_* &>/dev/null
}
trap "cleanupWhenExit" EXIT
printStackOfThreads() {
local line
local count=1
while IFS=" " read -a line ; do
local pid=${line[0]}
local threadId=${line[1]}
local threadId0x="0x`printf %x ${threadId}`"
local user=${line[2]}
local pcpu=${line[4]}
local jstackFile=/tmp/${uuid}_${pid}
[ ! -f "${jstackFile}" ] && {
if [ "${user}" == "${USER}" ]; then
jstack ${pid} > ${jstackFile}
else
if [ $UID == 0 ]; then
sudo -u ${user} jstack ${pid} > ${jstackFile}
else
redEcho "[${count}] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
yellowEcho " sudo ${COMMAND_LINE[@]}"
echo
continue
fi
fi
}
blueEcho "[${count}] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user}):"
sed "/nid=${threadId0x} /,/^$/p" -n ${jstackFile}
count=$((count+1))
done
}
ps -Leo pid,lwp,user,comm,pcpu --no-headers | {
[ -z "${pid}" ] && awk '$4=="java"{print $0}' || awk -v pid="${pid}" '$1==pid,$4=="java"{print $0}'
} | sort -k5 -r -n | head -n "${count}" | printStackOfThreadsSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
