How I Cut FullGC Frequency from 40×/day to Once Every 10 Days: A JVM Tuning Journey
This article details a month‑long investigation and step‑by‑step tuning of a Java server's JVM parameters, memory‑leak fixes, and metaspace adjustments that reduced FullGC from dozens of times daily to a single occurrence every ten days while improving overall throughput.
Problem
The production servers (2 CPU, 4 GB RAM, 4‑node cluster) were experiencing excessive FullGC—over 40 times per day—and frequent automatic restarts, indicating severe JVM memory pressure.
Initial JVM Parameters
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC-Xmx1800M sets the maximum heap size.
-Xms1000M sets the initial heap size; matching it to Xmx avoids re‑allocation after GC.
-Xmn350M defines the young generation size (≈3/8 of total heap is recommended).
-Xss300K sets each thread's stack size.
First Optimization
Observations showed the young generation was too small, causing frequent YoungGC and long collection times (≈830 s). The initial heap size also differed from the maximum.
-Xmn350M -> -Xmn800M<br/>-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8<br/>-Xms1000M -> -Xms1800MAfter deploying the new settings to two nodes (prod, prod2) for five days, YoungGC frequency dropped by more than half and its duration decreased by 400 s, but FullGC count unexpectedly rose by 41.
The first attempt was deemed a failure because FullGC increased.
Second Optimization – Memory Leak Investigation
During analysis, a bean (type T) was found to have over 10 000 instances (~20 MB) retained by an anonymous inner‑class listener that never released references after a timeout.
public void doSmthing(T t) {<br/> redis.addListener(new Listener(){<br/> public void onTimeout(){<br/> if(t.success()){ /* do work */ }<br/> }<br/> });<br/>}Fixing the listener leak reduced some memory pressure but did not stop server restarts.
Further Leak Detection
Heap dumps later revealed thousands of ByteArrowRow objects (≈40 k) originating from massive database queries. An unexpected traffic spike (≈83 MB/s) was observed, but cloud provider confirmed it was normal traffic.
The root cause turned out to be a missing module condition in a query, causing a full table scan of over 400 k rows, which saturated memory and triggered restarts.
Second Optimization – Metaspace & GC Tuning
GC logs showed FullGC occurring even when old‑gen usage was below 30 %. Research indicated metaspace growth could trigger FullGC. The default metaspace (21 MB) had expanded to ~200 MB.
-Xmn350M -> -Xmn800M<br/>-Xms1000M -> 1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75 -Xmn350M -> -Xmn600M<br/>-Xms1000M -> 1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75Four servers were compared (prod1‑prod4). The two servers with larger young generation (prod1, prod2) showed dramatically lower FullGC and YoungGC counts, higher thread start counts, and overall better throughput.
Final Results
After the second round of tuning, FullGC frequency dropped to less than one per day, YoungGC frequency halved, and overall throughput increased noticeably on prod1. The single FullGC observed on prod1 was explained by a brief metaspace spike.
Conclusion
FullGC occurring more than once per day is abnormal.
When FullGC spikes, investigate memory leaks first.
After fixing leaks, JVM tuning opportunities are limited; avoid over‑optimizing.
If CPU stays high after code checks, consult the cloud provider—hardware issues can cause 100 % CPU.
High inbound traffic may stem from inefficient database queries; verify query conditions.
Regularly monitor GC metrics to catch problems early.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
