Full GC Root Cause Analysis and Resolution in Java Applications
This article documents a step‑by‑step investigation of a high TP99 caused by frequent Full GC in a Java service, describing the diagnostic mindset, tools used, GC trigger conditions, object promotion mechanisms, the impact of AdaptiveSizePolicy and Metaspace, and the concrete configuration and code changes that eliminated the issue.
The article records a Full GC investigation that led to an abnormally high TP99, aiming to provide newcomers with a clear troubleshooting mindset and practical steps.
Because most production servers lack SSH access, the investigation relied on platform‑provided tools such as JDOS container monitoring, JDOS process query, SGM container monitoring, and SGM method call query:
JDOS容器智能监控: 查看容器的CPU,内存,磁盘,IO等信息
JDOS进程查询: 查看Java进程编号,执行常用的Java内存进程查看命令
SGM容器监控信息: 查看JVM虚拟机内存变更历史记录
SGM方法调用查询: 查看某一次关键接口调用的上下依赖,时间分布The problem originated from occasional interface timeouts observed around 10 am, which were traced back to frequent Full GC events.
Full GC Trigger Conditions identified were:
Explicit System.gc() call (not present).
Old generation space shortage.
Metaspace (method area) shortage.
Old generation occupancy after Minor GC exceeds its capacity.
Large objects being promoted directly to the old generation.
Monitoring showed the old generation repeatedly reaching 90 % occupancy at the same timestamps as Full GC spikes.
Object Promotion Scenarios examined included:
Age‑based promotion after surviving 15 Minor GCs.
Large objects exceeding -XX:PretenureSizeThreshold .
Dynamic age judgment when Survivor space usage exceeds 50 %.
Space guarantee mechanism moving objects directly to the old generation when Eden is too full.
Using JMAT (Eclipse Memory Analyzer) on heap dumps helped rule out large objects and long‑lived static maps.
The investigation pinpointed two key causes:
Enabled -XX:+UseAdaptiveSizePolicy caused premature promotion of objects to the old generation, accelerating old‑gen growth and triggering Full GC.
Metaspace growth (up to ~300 MB) also forced Full GC.
Solutions applied:
-XX:-UseAdaptiveSizePolicyand adding JVM flags to trace class loading/unloading and GC details:
-XX:+TraceClassUnloading -XX:+TraceClassLoading -XX:+PrintGCDetailsAfter disabling AdaptiveSizePolicy and fixing a frequent class‑loading pattern (the com.googlecode.aviator.Expression rule‑engine class), Full GC frequency dropped from hourly to negligible.
Final recommendations emphasize narrowing down direct causes, asking “why” at each clue, and using the presented Full GC troubleshooting flowchart as a reference.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.