Interview Experience 47 min read

How to Diagnose Frequent Full GC in Java Interviews

This article explains the root‑cause analysis and step‑by‑step troubleshooting process for frequent Full GC events in Java applications, covering trigger mechanisms, impact assessment, common causes, monitoring tools, heap‑dump analysis, and both short‑term fixes and long‑term architectural improvements.

Tech Freedom Circle

Sep 5, 2025

How to Diagnose Frequent Full GC in Java Interviews

Full GC Overview

Full GC is triggered when the Old Generation or Metaspace runs out of space. The garbage collector (e.g., G1, CMS, Serial Old) performs a full‑heap collection, stops all application threads (STW), consumes significant CPU and memory, and can cause latency spikes or service crashes.

Typical Trigger Scenarios

Old Generation Exhaustion : Large objects allocated directly to Old Gen or rapid promotion of survivor objects fill Old Gen beyond ~90%.

Metaspace Overflow : Excessive dynamic class generation (e.g., Groovy scripts, CGLib proxies) exceeds Metaspace limits.

Explicit System.gc() : Application code or framework‑initiated Full GC calls.

GC Algorithm Failures : CMS Concurrent Mode Failure or G1 Evacuation Pause that cannot allocate space.

Impact of Frequent Full GC

Service availability drops as STW pauses accumulate.

CPU consumption creates a feedback loop: longer pauses generate more objects, which increase GC frequency.

In distributed systems a single node’s Full GC can cascade into a cluster‑wide outage (death spiral).

Root‑Cause Classification

Local cache over‑allocation : Old Gen >80% occupied by a ConcurrentHashMap or @Cacheable entries. Remediation : Replace with Caffeine/Guava (TTL) or externalize to Redis, shard large keys.

Message bloat : Kafka messages >512 KB create large temporary objects. Remediation : Send only IDs, enable Snappy/LZ4 compression, split large payloads.

Database query explosion : Unpaginated SELECT * returns multi‑megabyte result sets. Remediation : Enforce pagination, use cursor streaming, select only required columns.

ThreadLocal leakage : ThreadLocal objects persist across thread‑pool reuse. Remediation : Always call remove() in a finally block or use TransmittableThreadLocal.

Reflection/ASM abuse : Massive dynamic class generation fills Metaspace. Remediation : Cache reflective Method / Constructor objects, limit class‑loader creation, close GroovyClassLoader after use.

Improper JVM parameters : Small young generation, aggressive G1 pause targets, etc. Remediation : Tune -XX:NewRatio, -XX:SurvivorRatio, -XX:MaxGCPauseMillis, -XX:InitiatingHeapOccupancyPercent.

Four‑Step Investigation Process

Data Collection : Enable detailed GC logging (e.g.,

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xlog:gc*:file=/var/log/jvm/gc.log

) and configure heap‑dump on Full GC ( jmap -dump:live,format=b,file=heap.hprof <pid>).

GC Log Analysis : Identify trigger type via keywords such as [Full GC (Ergonomics)], [Full GC (Metadata GC Threshold)]; measure STW duration; evaluate post‑GC memory reclamation.

Heap Dump Inspection : Use MAT or Arthas to examine the dominator tree, locate large retained objects, trace GC roots, and analyze class‑loader statistics for Metaspace issues.

Root‑Cause Validation : Correlate findings with code (e.g., static cache size, missing ThreadLocal cleanup), reproduce the scenario, and verify that the remedial action reduces Full GC frequency.

Tool Recommendations

GC log analysis: GCeasy , jstat, Prometheus + Grafana.

Heap dump analysis: MAT (Dominators, Leak Suspects) or Arthas ( heapdump command).

Online diagnostics in containers: jstat -gc <pid> 1s, jmap -dump:live,file=/tmp/heap.hprof <pid>, sidecar containers for log collection.

Emergency Fixes (Minutes‑to‑Hours)

Adjust JVM flags: increase young generation, raise Metaspace limits, set -XX:PretenureSizeThreshold for large objects.

Temporarily clear static caches via admin endpoints or restart the service.

Apply rate limiting or circuit breaking to reduce object‑creation bursts.

Short‑Term Optimizations (1‑3 Days)

Replace raw ConcurrentHashMap caches with Caffeine/Guava (TTL, maximumSize).

Externalize large caches to Redis or Memcached.

Introduce pagination or cursor‑based streaming for bulk DB queries.

Compress large messages (Snappy/LZ4) and trim payloads.

Ensure proper ThreadLocal cleanup (try‑finally or TTL).

Cache reflective MethodHandle instead of repeated reflection.

Mid‑Term Optimizations (Weeks‑Months)

Capacity planning and load testing to size heap, young generation, and Metaspace.

Adopt multi‑level caching (local + distributed) with sharding for large keys.

Migrate batch jobs to streaming frameworks (Flink, Spark Streaming) to avoid full‑dataset loading.

Standardize monitoring dashboards (Full GC frequency, Old Gen usage, request latency) and alert thresholds.

Automate heap‑dump collection on alert via sidecar or APM integration (SkyWalking, Pinpoint).

Kubernetes‑Specific Heap Dump Procedure

Enter the container: kubectl exec -it <pod> -- /bin/bash.

Run heap dump on the main process (usually PID 1): jmap -dump:live,format=b,file=/tmp/heap.hprof 1.

Copy the dump to the host:

kubectl cp <namespace>/<pod>:/tmp/heap.hprof ./heap.hprof

Analyze the dump locally with MAT to avoid consuming container resources.

Conclusion

Frequent Full GC is a symptom of mismatched resource usage and application design rather than a JVM bug. By following a systematic observation‑analysis‑verification loop—collecting high‑quality GC logs and heap dumps, analyzing them with the right tools, and validating hypotheses against the code—engineers can pinpoint the exact cause (e.g., oversized static cache, uncontrolled message size, unpaginated DB access, ThreadLocal leakage, Metaspace bloat) and apply targeted fixes that restore performance and prevent future incidents.

Java JVM Performance Tuning interview Full GC

Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Full GC Overview

Typical Trigger Scenarios

Impact of Frequent Full GC

Root‑Cause Classification

Four‑Step Investigation Process

Tool Recommendations

Emergency Fixes (Minutes‑to‑Hours)

Short‑Term Optimizations (1‑3 Days)

Mid‑Term Optimizations (Weeks‑Months)

Kubernetes‑Specific Heap Dump Procedure

Conclusion

Tech Freedom Circle

How this landed with the community

Was this worth your time?

0 Comments

Short‑Term Optimizations (1‑3 Days)