Operations 11 min read

How to Diagnose and Fix High CPU Usage in Java Data Platforms

This article walks through a real‑world incident where a data‑platform server showed near‑100% CPU usage, explains step‑by‑step investigation using top, pwdx, and jstack, identifies a time‑conversion utility as the root cause, and presents a streamlined script‑based solution that reduced CPU load by thirtyfold.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose and Fix High CPU Usage in Java Data Platforms

1. Incident Overview

Yesterday afternoon an ops alert indicated the data‑platform server CPU utilization had spiked to 98.94% and stayed above 70% for a while. The service is not high‑concurrency or CPU‑intensive, suggesting the problem lies in the business code rather than hardware.

2. Investigation Steps

2.1 Identify High‑Load Process (PID)

Log into the server and run

top

to view current load. The load average and an 8‑core benchmark confirm high load, and process ID 682 shows a large CPU share.

2.2 Locate the Affected Service

Use

pwdx 682

to find the executable path, revealing that the process belongs to the data‑platform web service.

2.3 Find the Problematic Thread and Code Line

A traditional four‑step method is:

1. top -o %CPU – sort processes by CPU usage to find maxLoad(pid) 2. top -Hp <PID> – list threads of that PID 3. printf "0x%x " <thread‑PID> – convert thread ID to hex for jstack lookup 4. jstack <PID> | vim +/0x<hex> – search the stack trace

Because this is time‑critical in production, the author previously packaged these steps into a helper script

show-busy-java-threads.sh

, which quickly pinpoints busy Java threads.

3. Root‑Cause Analysis

The investigation traced the high CPU to a time‑utility method that converts timestamps to formatted date strings. This method is invoked repeatedly in real‑time report queries: for each second from midnight to the current time, the method runs, leading to 10 × 60 × 60 × n calls (e.g., 36,000 × n) that grow linearly throughout the day, exhausting CPU.

4. Solution

After locating the offending method, the team realized the returned

Set

was never used; only its size was needed. They replaced the heavy conversion with a simple calculation of

currentSeconds - midnightSeconds

. Deploying the change reduced CPU load by about 30×, returning the server to normal operation.

5. Takeaways

Performance should be considered alongside functional correctness during development.

Regular code reviews help discover more efficient implementations.

Never overlook small details in production incidents; thorough investigation is key to reliability.

Show‑Busy‑Java‑Threads.sh Script

<code>#!/bin/bash
# @Function
# Find out the highest cpu consumed threads of java, and print the stack of these threads.
# @Usage
#   $ ./show-busy-java-threads.sh
# @author Jerry Lee

readonly PROG=$(basename $0)
readonly -a COMMAND_LINE=($0 "$@")

usage(){
cat <<EOF
Usage: ${PROG}[OPTION]...
Find out the highest cpu consumed threads of java, and print the stack of these threads.
Example: ${PROG} -c 10

Options:
-p,--pid       find out the highest cpu consumed threads from the specified java process, default from all java processes.
-c,--count     set the thread count to show, default is 5
-h,--help      display this help and exit
EOF
exit $1
}

readonly ARGS=$(getopt -n "$PROG" -a -o c:p:h -l count:,pid:,help -- "$@")
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"

while true; do
  case "$1" in
    -c|--count) count="$2"; shift 2;;
    -p|--pid)   pid="$2";   shift 2;;
    -h|--help)  usage 0;;
    --) shift; break;;
    *) break;;
  esac
done
count=${count:-5}

redEcho(){
  [ -c /dev/stdout ] && { echo -ne "\033[1;31m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}

yellowEcho(){
  [ -c /dev/stdout ] && { echo -ne "\033[1;33m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}

blueEcho(){
  [ -c /dev/stdout ] && { echo -ne "\033[1;36m"; echo -n "$@"; echo -e "\033[0m"; } || echo "$@"
}

# Check the existence of jstack command!
if ! which jstack &>/dev/null; then
  [ -z "$JAVA_HOME" ] && { redEcho "Error: jstack not found on PATH!"; exit 1; }
  [ ! -f "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack does NOT exist!"; exit 1; }
  [ ! -x "$JAVA_HOME/bin/jstack" ] && { redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack is NOT executable!"; exit 1; }
  export PATH="$JAVA_HOME/bin:$PATH"
fi

readonly uuid=`date +%s`_${RANDOM}_$$

cleanupWhenExit(){
  rm -f /tmp/${uuid}_* &>/dev/null
}
trap "cleanupWhenExit" EXIT

printStackOfThreads(){
  local line
  local count=1
  while IFS=" " read -a line; do
    local pid=${line[0]}
    local threadId=${line[1]}
    local threadId0x="0x$(printf %x ${threadId})"
    local user=${line[2]}
    local pcpu=${line[4]}
    local jstackFile=/tmp/${uuid}_${pid}
    [ ! -f "${jstackFile}" ] && {
      if [ "${user}" = "${USER}" ]; then
        jstack ${pid} >${jstackFile}
      else
        if [ $UID -eq 0 ]; then
          sudo -u ${user} jstack ${pid} >${jstackFile}
        else
          redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
          redEcho "User of java process($user) is not current user($USER), need sudo to run again:"
          yellowEcho "    sudo ${COMMAND_LINE[@]}"
          echo
          continue
        fi
      fi
    }
    blueEcho "[$((count++))] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user}):"
    sed -n "/nid=${threadId0x} /,/^$/p" ${jstackFile}
  done
}

ps -Leo pid,lwp,user,comm,pcpu --no-headers |
  { [ -z "${pid}" ] && awk '$4=="java"{print $0}' || awk -v "pid=${pid}" '$1==pid && $4=="java"{print $0}'; } |
  sort -k5 -r -n | head -n "${count}" | printStackOfThreads
</code>
OperationsCPU optimizationJava performanceshell scriptServer troubleshooting
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.