Cutting Full GC from 40× Daily to Once Every 10 Days: JVM Tuning Insights
Over a month, we reduced Full GC occurrences on a 2‑core, 4 GB JVM cluster from roughly 40 times per day to once every ten days, while halving Young GC duration, by adjusting heap parameters, fixing memory leaks, and tuning metaspace, ultimately improving server throughput and stability.
During more than a month of effort, we optimized a JVM cluster (2 CPU, 4 GB RAM, four servers) to lower Full GC frequency from about 40 times per day to roughly once every ten days, and cut Young GC time by more than half.
Initial Situation
The servers experienced frequent Full GC (average >40 times per day) causing automatic restarts, indicating severe instability.
Key JVM startup parameters were:
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC-Xmx1800M: maximum heap size.
-Xms1000M: initial heap size (should match -Xmx to avoid reallocation).
-Xmn350M: young generation size (recommended 3/8 of total heap).
-Xss300K: thread stack size.
First Optimization
We increased the young generation and aligned initial and maximum heap sizes:
-Xmn350M -> -Xmn800M
-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8
-Xms1000M -> -Xms1800MAfter deploying to two servers for five days, Young GC frequency dropped by more than half and its duration decreased by 400 s, but Full GC frequency rose dramatically, indicating the first attempt failed.
Second Optimization – Memory Leak Investigation
A large number of instances of object T (≈20 MB) were retained due to an anonymous inner class listener that was never released after a timeout, causing a memory leak.
public void doSmthing(T t) {
redis.addListener(new Listener(){
public void onTimeout(){
if(t.success()){
// execute operation
}
}
});
}Removing the leak reduced the number of objects but did not fully resolve the issue; the servers still restarted.
Further Leak Detection
Heap dumps revealed tens of thousands of ByteArrayRow objects, traced to a missing module condition in a database query that fetched over 400 k rows, generating massive traffic (≈83 MB/s) despite no actual load.
Fixing the query eliminated the abnormal traffic and, after three days with the original JVM parameters, Full GC occurred only five times.
Second Tuning Phase
With the leak resolved, we focused on metaspace size, which had grown to ~200 MB (default 21 MB) and triggered Full GC. Adjusted parameters for two servers (prod1, prod2) were:
-Xmn800M
-Xms1800M
-XX:MetaspaceSize=200M
-XX:CMSInitiatingOccupancyFraction=75For the other two servers (prod3, prod4) we kept the original settings. After ten days, prod1 and prod2 showed significantly lower Full GC counts and Young GC frequency compared to prod3 and prod4, and overall throughput improved.
Summary
Full GC more than once per day indicates a serious problem.
When Full GC spikes, prioritize investigating memory leaks.
After fixing leaks, JVM tuning opportunities are limited; invest time wisely.
Persistent high CPU should be checked with the cloud provider after ruling out code issues.
Unexpected high inbound traffic may stem from database queries; verify query conditions.
Regularly monitor GC to detect issues early.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
