JVM Garbage Collection Tuning Experience: Reducing FullGC Frequency and Solving Memory Leaks
Over a month of systematic JVM tuning, the author reduced FullGC frequency from 40 times per day to once every ten days, halved YoungGC time, identified and fixed a memory leak caused by anonymous inner‑class listeners, and documented the step‑by‑step optimization process with configuration changes and performance results.
In this article the author, a senior architect, shares a month‑long experience of optimizing JVM garbage collection on a four‑node production cluster (2 CPU / 4 GB each) that suffered from frequent FullGC (≈40 times per day) and occasional server restarts.
Problem : Excessive FullGC and YoungGC caused high latency and instability. Initial GC logs showed very frequent FullGC and long YoungGC pauses.
Initial JVM parameters (per node):
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGCKey explanations of the flags were listed, e.g., -Xmx1800M sets the maximum heap, -Xmn350M defines the young generation size, etc.
First Optimization
The author increased the young generation to 800 MB, set -Xms equal to -Xmx , and changed -XX:SurvivorRatio from 4 to 8:
-Xmn350M -> -Xmn800M
-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8
-Xms1000M -> -Xms1800MAfter deploying these changes to two servers (prod, prod2) for five days, YoungGC frequency dropped by more than 50% and its pause time decreased by ~400 s, but FullGC count unexpectedly increased by 41, indicating a failed optimization.
Second Optimization – Memory Leak Investigation
During analysis a class T was found to have >10 000 instances (~20 MB) due to an anonymous inner‑class listener that retained references and never released them, causing a memory leak.
public void doSmthing(T t){
redis.addListener(new Listener(){
public void onTimeout(){
if(t.success()){
// execute operation
}
}
});
}Fixing the leak reduced the overall memory pressure, but FullGC remained high. Further investigation revealed a query that unintentionally fetched >400 000 rows because a module condition was missing, leading to massive object creation (≈40 000 ByteArrowRow objects) and occasional spikes in inbound traffic.
Second Optimization – Metaspace and CMS Tuning
Observing that Metaspace grew to ~200 MB (far above the default 21 MB), the author added the following parameters to two servers (prod1, prod2) while keeping the other two unchanged:
-Xmn350M -> -Xmn800M
-Xms1000M -> -Xms1800M
-XX:MetaspaceSize=200M
-XX:CMSInitiatingOccupancyFraction=75Another set for prod3 and prod4 used -Xmn600M instead of 800 M. After ~10 days the results showed:
FullGC frequency on prod1 and prod2 was dramatically lower than on prod3 and prod4.
YoungGC frequency on prod1/2 was about half of that on prod3/4.
Throughput (thread start count) on prod1 increased by roughly one day’s worth of work compared to the others.
Overall, the optimization succeeded: FullGC occurrences were cut to only five times in three days when using the original parameters, and the server’s throughput and GC pause times improved significantly.
Summary of Findings
FullGC more than once per day is abnormal.
When FullGC spikes, first investigate memory leaks.
After fixing leaks, JVM tuning opportunities become limited; focus on critical issues.
High CPU may stem from server‑level problems; consult cloud provider if needed.
Unexpected inbound traffic can indicate hidden database queries; verify query conditions.
Regularly monitor GC logs to detect issues early.
The article also contains promotional material for a ChatGPT‑focused community and various unrelated advertisements, which are omitted from the technical summary.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.