Operations 5 min read

Step‑by‑Step CPU Issue Diagnosis for a Java Application in a Kubernetes Pod

This article walks through a real‑world investigation of a pod whose CPU spiked to over 90%, detailing how abnormal JVM garbage‑collection patterns were identified, traced to a specific Java thread, and resolved by fixing an inefficient Excel export routine.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Step‑by‑Step CPU Issue Diagnosis for a Java Application in a Kubernetes Pod

The author received an online alert that a container’s CPU usage had surged to over 90% and noticed an unusually high number of Young GC and a Full GC events within two hours, prompting a detailed troubleshooting session.

First, a normal JVM monitoring curve is shown for reference, followed by the problematic curve that displays frequent GC spikes, indicating abnormal behavior.

Investigation steps include locating the affected pod, entering it, and using top to observe process resource usage; the Java process (PID 1) showed CPU usage around 130% on a multi‑core node. The command top -H -p <pid> was then used to list threads, revealing the thread ID (tid) responsible for the load. The tid was converted to hexadecimal with printf "%x\n" 746, and the stack trace was captured using jstack <pid> | grep 2ea >gc.stack.

Because the stack file was large, it was downloaded via a simple HTTP server started with python -m SimpleHTTPServer 8080, accessed through a jump host using curl -o http://<ip>/gcInfo.stack, and then examined locally to locate the stack entry for the problematic thread.

Analysis of the stack pinpointed the issue to an asynchronous Excel export function that reused a common list‑query API limited to 200 records per page; the export attempted to process tens of thousands of records, causing nested loops, excessive object creation, and repeated GC, ultimately leading to pod restarts. The code was fixed and redeployed, resolving the CPU spike.

In conclusion, when encountering production incidents, ensure service availability first, then methodically drill down through monitoring data and thread dumps; familiarity with tools like jstack and arthas can greatly simplify the diagnosis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMKubernetesTroubleshootingCPUgc
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.