Why Did One Pod Trigger 61 Young GCs and a Full GC? A Step‑by‑Step Diagnosis
A developer encountered a sudden CPU spike caused by excessive JVM garbage collection in a single Kubernetes pod, and by using Linux monitoring tools, thread‑ID conversion, jstack analysis, and file transfer techniques pinpointed a flawed Excel export implementation that created massive in‑memory lists, ultimately fixing the issue.
Scenario
During a Friday documentation session, an alert indicated CPU usage over 90%. Monitoring showed a pod with 61 young GC events and one full GC within two hours, prompting an urgent investigation.
Normal vs. abnormal GC curves
Typical JVM monitoring shows rare GC activity, while the problematic pod displayed frequent young GCs and a full GC, clearly visible in the provided charts.
Investigation steps
Identify the pod with abnormal GC activity among multiple pods.
Enter the pod and run top to view process resource usage; the Java process (pid 1) showed CPU around 130% on a multi‑core node.
Run top -H -p pid to list threads and find the highest‑CPU thread ID (tid).
Convert the decimal thread ID to hexadecimal using printf "%x\n" 746, because thread IDs appear in stack traces as hex.
Extract the stack trace of that thread with jstack pid | grep 2ea > gc.stack.
Transfer the generated gc.stack file to a local machine: start a simple HTTP server inside the pod with python -m SimpleHTTPServer 8080, then download it via curl -o http://<em>IP</em>/gcInfo.stack.
Search the downloaded stack file for the hexadecimal thread ID, locate the corresponding Java method, and analyze the source code.
Root cause
The issue stemmed from an asynchronous Excel export feature that reused a common list‑query API limited to 200 items per page. The export demanded tens of thousands of records, causing nested loops to retain large list objects in memory. This led to massive allocations, frequent young GCs, a full GC, and eventually a pod restart.
Resolution
The code was refactored to avoid building huge in‑memory lists during export, the fix was deployed quickly, and the GC spikes disappeared.
Takeaways
When sudden high CPU and GC activity occur, first verify if the problem is isolated to a single pod, use OS tools ( top, top -H) to locate the offending thread, convert thread IDs to hexadecimal, and correlate them with jstack output. Also, be cautious with large data structures in high‑throughput services to prevent excessive garbage collection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
