How I Cut Full GC Frequency by 80%: A JVM Tuning Case Study
Over a month of systematic JVM tuning reduced Full GC from 40 times per day to once every ten days and halved Young GC duration by adjusting heap sizes, survivor ratios, and metaspace settings while investigating and fixing a memory leak caused by an anonymous inner class listener.
Background
A four‑node production cluster (2 CPU / 4 GB RAM per node) experienced >40 Full GC events per day and occasional automatic restarts, indicating severe JVM memory pressure.
Initial JVM Configuration
-Xms1000M -Xmx1800M -Xmn350M -Xss300K
-XX:+DisableExplicitGC
-XX:SurvivorRatio=4
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70
-XX:+CMSParallelRemarkEnabled
-XX:LargePageSizeInBytes=128M
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGCObservations: the young generation ( -Xmn350M) was too small, -Xms differed from -Xmx, and the survivor ratio limited object promotion.
First Optimization Attempt
-Xmn350M → -Xmn800M -XX:SurvivorRatio=4 → -XX:SurvivorRatio=8 -Xms1000M → -Xms1800M
Deployed to two nodes (prod, prod2). Young GC frequency dropped >50 % and pause time decreased by ~400 s, but Full GC count increased by 41 times, making the change ineffective.
Memory Leak Discovery
A bean T had >10 000 instances (~20 MB) retained by an anonymous inner‑class listener:
public void doSmthing(T t){
redis.addListener(new Listener(){
public void onTimeout(){
if(t.success()){
// execute operation
}
}
});
}The listener held a strong reference to T until a timeout (≈1 min) fired, preventing garbage collection.
Leak Investigation and Fix
After removing the listener and cleaning error‑log events, the leak was partially mitigated but Full GC remained high. Heap dumps revealed ~40 000 ByteArrowRow objects generated by database queries that lacked a required module filter, causing an unexpected inbound traffic spike of 83 MB/s.
Adding the missing filter reduced the result set from >400 000 rows to a normal size. Full GC frequency subsequently dropped to five occurrences over three days.
Second Tuning – Metaspace Adjustment
GC logs showed Full GC even when old‑gen usage was <30 %. Metaspace had grown from the default ~21 MB to ~200 MB. The following parameters were applied:
-Xmn800M -Xms1800M -XX:MetaspaceSize=200M -XX:CMSInitiatingOccupancyFraction=75
Two nodes (prod 1 & prod 2) used the above settings; two control nodes (prod 3 & prod 4) kept -Xmn600M while sharing the other parameters. After ten days, the tuned nodes exhibited dramatically lower Full GC counts and ~50 % fewer Young GC events compared with the controls, and throughput (measured by thread start‑up rate) improved noticeably.
Final Results
Full GC reduced from >40 times/day to <5 times over three days.
Young GC pause time cut by more than half.
Overall throughput increased on the best‑tuned server.
Key Takeaways
Full GC occurring more than once per day signals a serious memory issue.
When Full GC spikes, prioritize investigation of memory leaks.
After leaks are resolved, JVM tuning gains diminish; focus on code quality.
Unexpected inbound traffic often originates from unfiltered database queries.
Regular GC monitoring helps detect problems early.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
