How to Diagnose Frequent Full GC in Production Systems? (Second Interview at Taobao)
The article explains why Full GC should be minimized, defines normal versus abnormal GC frequencies, outlines the root causes of Full GC, and provides a step‑by‑step troubleshooting workflow with concrete code snippets, monitoring commands and real‑world examples for Java backend engineers.
Interview Focus Points
Production experience : Explain how to distinguish “normal” and “abnormal” Full GC behavior based on real projects.
GC mechanism : Describe what triggers Full GC (old generation full, Metaspace full, explicit System.gc(), CMS Concurrent Mode Failure).
Troubleshooting ability : Show a systematic approach to investigate frequent Full GC.
Core Answer
Ideally Full GC never occurs, or it happens only every few days or weeks. Frequent Full GC always indicates a problem.
Every few days or longer – Normal . Old generation grows slowly, GC efficiency is high.
Every few hours – Needs attention . May be undersized memory or a slow leak; watch the trend.
Every few tens of minutes – Abnormal . High probability of a leak or mis‑configuration; investigate immediately.
Every few minutes or seconds – Critical alert . System becomes practically unusable; urgent investigation required.
Beyond the absolute interval, the trend matters: a decreasing interval signals an impending memory leak even if the current frequency seems acceptable.
Why Full GC "the less the better"
Full GC pauses the entire JVM (Stop‑The‑World). A 2‑second pause at 1000 QPS queues 2000 requests, causing time‑outs or visible loading screens. Therefore minimizing Full GC is a strict requirement.
Comparison of GC types:
Young GC : Cleans only the young generation (hundreds of MB); pause 10‑50 ms, usually unnoticeable.
Full GC : Cleans the entire heap plus Metaspace; pause hundreds of ms to seconds.
Full GC Trigger Conditions
Old generation space exhausted Objects surviving multiple Young GCs are promoted; when the old generation fills, Full GC occurs.
// Large object allocated directly in old generation
byte[] bigArray = new byte[10 * 1024 * 1024]; // 10 MB
// If size exceeds -XX:PretenureSizeThreshold, it goes straight to old gen
// Old gen filled → Full GCMetaspace exhausted When -XX:MaxMetaspaceSize is reached, Full GC is triggered. Frameworks that generate many classes (dynamic proxies, CGLIB, JSP hot‑reload) can hit this limit.
Explicit call Calling System.gc() suggests a Full GC (not guaranteed to run immediately). It can be disabled with -XX:+DisableExplicitGC , but some NIO frameworks rely on it for off‑heap memory reclamation.
// Explicit GC request
System.gc(); // Suggests Full GCCMS Concurrent Mode Failure When using the CMS collector, insufficient old‑gen space for promotion causes a fallback to Serial GC, which is a Full GC with very long pauses.
How to Investigate Frequent Full GC (5‑Step Process)
Examine GC logs to determine frequency and pause duration.
Identify the trigger from the logs. In JDK 11+, -Xlog:gc* shows detailed reasons (old‑gen full, Metaspace full, System.gc()).
Run jmap -histo:live or dump the heap with Arthas/MAT to see which objects occupy most memory.
Locate the root cause: common culprits are memory leaks (e.g., ThreadLocal without remove(), static collections growing indefinitely) or mis‑sized JVM parameters.
After fixing, perform load testing to verify that Full GC frequency returns to normal.
Real‑world case : An online service’s Full GC frequency grew from 2‑3 times per day to once per hour. Heap‑dump analysis revealed a HashMap stored in a static variable that kept growing. Adding cleanup logic eliminated the leak and restored normal GC behavior.
How to Reduce Full GC Occurrence
Set appropriate heap size
-Xms4g -Xmx4g // Same initial and max size to avoid dynamic resizingAdjust young generation ratio
-XX:NewRatio=2 // Old:Young = 2:1 (default)
// Or set absolute size
-Xmn2gConfigure Metaspace
-XX:MetaspaceSize=256m
-XX:MaxMetaspaceSize=512mChoose a modern collector G1 and ZGC handle Full GC much better than CMS. G1 performs Mixed GC before the old generation fills, reducing Full GC triggers.
High‑Frequency Interview Follow‑Ups
What is a normal Young GC interval? Seconds to tens of seconds is typical; each pause should stay under 50 ms.
How to handle an urgent Full GC‑induced outage? Restart the service, capture GC logs and a heap dump ( jmap -dump:format=b,file=heap.hprof <pid>), then analyze the dump to fix the leak or adjust parameters.
How to monitor Full GC?
GC logs: -Xlog:gc*:file=gc.log JMX: jstat -gcutil Visualization (Prometheus + Grafana with jmx_exporter)
Set alerts: interval < 1 hour → warning; interval < 10 minutes → critical.
Summary
There is no absolute standard for Full GC frequency, but the rule is clear: the fewer Full GCs, the better, and the trend matters more than the raw number . A cadence of days is healthy, hours warrants attention, and minutes demands immediate investigation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Handbook
Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
