JVM Garbage Collection Tuning: Reducing FullGC Frequency and Resolving Memory Leaks
This article documents a month‑long investigation and tuning of a Java backend server, detailing how FullGC frequency was cut from dozens per day to a handful by adjusting heap parameters, fixing a memory‑leak caused by anonymous inner‑class listeners, and optimizing metaspace settings.
After more than a month of effort, the FullGC frequency was reduced from about 40 times per day to less than once per day, and YoungGC time was cut by more than half, prompting a detailed record of the tuning process.
The initial problem was extremely frequent FullGC (average 40+ times daily) on a 2‑core, 4 GB server cluster, causing automatic restarts.
Key JVM startup parameters were:
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGCFirst optimization increased the young generation size and aligned initial heap size with the maximum:
-Xmn350M -> -Xmn800M<br/>-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8<br/>-Xms1000m -> -Xms1800mAfter five days, YoungGC frequency dropped by more than half, but FullGC count rose by 41, indicating a failed first attempt.
The second optimization uncovered a memory leak: an object T had over 10,000 instances (~20 MB) due to an anonymous inner‑class listener that never released references after a timeout.
public void doSmthing(T t){
redis.addListener(new Listener(){
public void onTimeout(){
if(t.success()){
// execute operation
}
}
});
}Fixing the leak reduced error logs but did not fully stop frequent restarts.
Further investigation revealed a massive data query caused by a missing module condition, pulling over 400,000 rows and creating ~40,000 ByteArrowRow objects, leading to high inbound traffic and server restarts.
After correcting the query, the servers returned to normal with only 5 FullGC events over three days using the original parameters.
Second round of tuning focused on metaspace, which had grown to ~200 MB, and adjusted heap and CMS settings:
-Xmn350M -> -Xmn800M<br/>-Xms1000M -> 1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75Two servers (prod1, prod2) received the larger young generation, while prod3 and prod4 kept the original settings. After ten days, FullGC and YoungGC frequencies on prod1 and prod2 were significantly lower, and throughput increased.
The final analysis concluded that frequent FullGC should first trigger a memory‑leak investigation, and after leaks are resolved, JVM tuning opportunities become limited. Monitoring GC regularly helps detect issues early, and checking inbound traffic and database queries can uncover hidden load problems.
Key takeaways:
FullGC more than once a day is abnormal.
Prioritize memory‑leak detection when FullGC spikes.
After fixing leaks, JVM tuning yields diminishing returns.
High CPU may stem from server issues; consult cloud provider if needed.
Investigate database query volume when inbound traffic is unexpectedly high.
Regular GC monitoring is essential for early problem detection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
