Investigating a Sudden CPU Spike Caused by Excessive GC in a Containerized Java Application

The article details a production incident where a containerized Java service experienced a CPU surge due to frequent young and full garbage collections, describing step‑by‑step diagnostics using Linux tools, jstack analysis, and code fixes that ultimately resolved the issue.

Architecture Digest
Architecture Digest
Architecture Digest
Investigating a Sudden CPU Spike Caused by Excessive GC in a Containerized Java Application

During a Thursday afternoon, an alert indicated that a container’s CPU usage jumped above 90%, and JVM monitoring showed one pod performing 61 young GCs and a full GC within two hours, prompting an urgent investigation.

The investigator entered the affected pod and ran top to view process resource usage, identified the Java process (PID 1) with unusually high CPU, and then executed top -H -p pid to locate the thread (tid) consuming the most CPU.

After noting the thread ID (e.g., 746), the ID was converted to hexadecimal with printf "%x\n" 746, and the stack trace for that thread was extracted using jstack pid | grep 2ea > gc.stack. Because the file was large, it was served via a temporary Python HTTP server and downloaded with curl -o http://<i>ip</i>/gcInfo.stack.

Analyzing the stack revealed that the problem originated in an Excel export feature that reused a shared list query interface; the interface returned at most 200 records per page, but the export attempted to process tens of thousands of records, causing nested loops and massive list allocations that triggered repeated garbage collections.

The code was corrected to avoid the shared list misuse, the fix was deployed urgently, and the CPU and GC metrics returned to normal, confirming the resolution.

The post concludes by emphasizing a calm, layered troubleshooting approach for production issues and recommends tools like Arthas to simplify JVM diagnostics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMperformanceContainerCPUgc
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.