Operations 6 min read

Why Did One Pod Trigger 61 Young GCs and a Full GC? A Step‑by‑Step Diagnosis

A developer encountered a sudden CPU spike caused by excessive JVM garbage collection in a single Kubernetes pod, and by using Linux monitoring tools, thread‑ID conversion, jstack analysis, and file transfer techniques pinpointed a flawed Excel export implementation that created massive in‑memory lists, ultimately fixing the issue.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
Why Did One Pod Trigger 61 Young GCs and a Full GC? A Step‑by‑Step Diagnosis

Scenario

During a Friday documentation session, an alert indicated CPU usage over 90%. Monitoring showed a pod with 61 young GC events and one full GC within two hours, prompting an urgent investigation.

Normal vs. abnormal GC curves

Typical JVM monitoring shows rare GC activity, while the problematic pod displayed frequent young GCs and a full GC, clearly visible in the provided charts.

Investigation steps

Identify the pod with abnormal GC activity among multiple pods.

Enter the pod and run top to view process resource usage; the Java process (pid 1) showed CPU around 130% on a multi‑core node.

Run top -H -p pid to list threads and find the highest‑CPU thread ID (tid).

Convert the decimal thread ID to hexadecimal using printf "%x\n" 746, because thread IDs appear in stack traces as hex.

Extract the stack trace of that thread with jstack pid | grep 2ea > gc.stack.

Transfer the generated gc.stack file to a local machine: start a simple HTTP server inside the pod with python -m SimpleHTTPServer 8080, then download it via curl -o http://<em>IP</em>/gcInfo.stack.

Search the downloaded stack file for the hexadecimal thread ID, locate the corresponding Java method, and analyze the source code.

Root cause

The issue stemmed from an asynchronous Excel export feature that reused a common list‑query API limited to 200 items per page. The export demanded tens of thousands of records, causing nested loops to retain large list objects in memory. This led to massive allocations, frequent young GCs, a full GC, and eventually a pod restart.

Resolution

The code was refactored to avoid building huge in‑memory lists during export, the fix was deployed quickly, and the GC spikes disappeared.

Takeaways

When sudden high CPU and GC activity occur, first verify if the problem is isolated to a single pod, use OS tools ( top, top -H) to locate the offending thread, convert thread IDs to hexadecimal, and correlate them with jstack output. Also, be cautious with large data structures in high‑throughput services to prevent excessive garbage collection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JVMKubernetesLinuxPerformance debugging
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.