Cutting Full GC from 40/day to Once Every 10 Days: Our JVM Tuning Story
Over a month we reduced Full GC occurrences from around 40 times per day to roughly one every ten days by systematically adjusting JVM parameters, addressing memory leaks, and fine‑tuning metaspace and young generation settings, ultimately achieving lower GC pause times and higher server throughput.
During more than a month of work we managed to lower Full GC frequency from about 40 times per day to roughly one occurrence every ten days, while also cutting Young GC time by more than half.
Initially the production servers (2 CPU, 4 GB RAM, four‑node cluster) suffered frequent Full GC and occasional automatic restarts. The original JVM startup options were:
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC-Xmx1800M sets the maximum heap size to 1800 MB.
-Xms1000M sets the initial heap size to 1000 MB; matching it to -Xmx avoids re‑allocation after each GC.
-Xmn350M defines the young generation size (350 MB). Increasing this reduces the old generation size; the recommended proportion is roughly 3/8 of the total heap.
-Xss300K sets the thread stack size. Adjusting this can increase the number of threads within the same physical memory.
First Optimization
We increased the young generation to 800 MB, set -Xms equal to -Xmx, and changed -XX:SurvivorRatio from 4 to 8:
-Xmn350M -> -Xmn800M<br/>-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8<br/>-Xms1000M -> -Xms1800MAfter five days the Young GC count dropped by more than half and its duration decreased by ~400 s, but Full GC count rose by 41, indicating the first attempt was only partially successful.
Second Optimization – Memory Leak Investigation
We discovered an object type T with over 10 000 instances consuming ~20 MB, caused by an anonymous inner‑class listener that retained references after timeout callbacks. The problematic code:
public void doSmthing(T t){
redis.addListener(new Listener(){
public void onTimeout(){
if(t.success()){
// execute operation
}
}
});
}Removing the listener and fixing related error logs reduced the leak but did not fully resolve the GC issue.
Further Leak Findings
Heap dumps later revealed tens of thousands of ByteArrowRow objects, traced to a missing module condition in a database query that fetched over 400 000 rows unintentionally. After correcting the query, the servers returned to normal operation with only five Full GC events in three days.
Second Optimization – Metaspace and Young Generation Tuning
GC logs showed Full GC occurring even when old‑generation usage was below 30 %. Research indicated metaspace growth could trigger Full GC. We increased metaspace to 200 M and adjusted the CMS occupancy threshold to 75 % while also enlarging the young generation:
-Xmn350M -> -Xmn800M<br/>-Xms1000M -> -Xms1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75Two servers (prod1, prod2) received the larger young generation, while prod3 and prod4 kept the original size. After ten days the results were:
Full GC frequency on prod1 and prod2 was far lower than on prod3 and prod4.
Young GC counts on prod1/2 were about half of those on prod3/4.
Throughput on prod1 improved noticeably, as indicated by higher thread‑start rates.
Overall, the tuning reduced GC pauses by more than 50 % and significantly increased server throughput.
Conclusion
Key takeaways from the month‑long JVM tuning effort:
Full GC occurring more than once per day signals a serious problem.
When Full GC spikes, investigate memory leaks first.
After fixing leaks, JVM tuning opportunities become limited; avoid over‑investing.
If CPU stays high after code checks, consult operations or cloud provider support.
Unexpected high inbound traffic may stem from inefficient database queries.
Regularly monitor GC metrics to detect issues early.
The process demonstrated that systematic parameter adjustments combined with thorough leak detection can dramatically improve Java application performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
