Step‑by‑Step CPU Issue Diagnosis for a Java Application in a Kubernetes Pod
This article walks through a real‑world investigation of a pod whose CPU spiked to over 90%, detailing how abnormal JVM garbage‑collection patterns were identified, traced to a specific Java thread, and resolved by fixing an inefficient Excel export routine.
The author received an online alert that a container’s CPU usage had surged to over 90% and noticed an unusually high number of Young GC and a Full GC events within two hours, prompting a detailed troubleshooting session.
First, a normal JVM monitoring curve is shown for reference, followed by the problematic curve that displays frequent GC spikes, indicating abnormal behavior.
Investigation steps include locating the affected pod, entering it, and using top to observe process resource usage; the Java process (PID 1) showed CPU usage around 130% on a multi‑core node. The command top -H -p <pid> was then used to list threads, revealing the thread ID (tid) responsible for the load. The tid was converted to hexadecimal with printf "%x\n" 746, and the stack trace was captured using jstack <pid> | grep 2ea >gc.stack.
Because the stack file was large, it was downloaded via a simple HTTP server started with python -m SimpleHTTPServer 8080, accessed through a jump host using curl -o http://<ip>/gcInfo.stack, and then examined locally to locate the stack entry for the problematic thread.
Analysis of the stack pinpointed the issue to an asynchronous Excel export function that reused a common list‑query API limited to 200 records per page; the export attempted to process tens of thousands of records, causing nested loops, excessive object creation, and repeated GC, ultimately leading to pod restarts. The code was fixed and redeployed, resolving the CPU spike.
In conclusion, when encountering production incidents, ensure service availability first, then methodically drill down through monitoring data and thread dumps; familiarity with tools like jstack and arthas can greatly simplify the diagnosis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
