Operations 11 min read

How I Cut FullGC Frequency from 40×/day to Once Every 10 Days: A JVM Tuning Journey

This article details a month‑long investigation and step‑by‑step tuning of a Java server's JVM parameters, memory‑leak fixes, and metaspace adjustments that reduced FullGC from dozens of times daily to a single occurrence every ten days while improving overall throughput.

Java Backend Technology

Jul 16, 2024

How I Cut FullGC Frequency from 40×/day to Once Every 10 Days: A JVM Tuning Journey

Problem

The production servers (2 CPU, 4 GB RAM, 4‑node cluster) were experiencing excessive FullGC—over 40 times per day—and frequent automatic restarts, indicating severe JVM memory pressure.

Initial JVM Parameters

-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

-Xmx1800M sets the maximum heap size.

-Xms1000M sets the initial heap size; matching it to Xmx avoids re‑allocation after GC.

-Xmn350M defines the young generation size (≈3/8 of total heap is recommended).

-Xss300K sets each thread's stack size.

First Optimization

Observations showed the young generation was too small, causing frequent YoungGC and long collection times (≈830 s). The initial heap size also differed from the maximum.

-Xmn350M -> -Xmn800M<br/>-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8<br/>-Xms1000M -> -Xms1800M

After deploying the new settings to two nodes (prod, prod2) for five days, YoungGC frequency dropped by more than half and its duration decreased by 400 s, but FullGC count unexpectedly rose by 41.

The first attempt was deemed a failure because FullGC increased.

Second Optimization – Memory Leak Investigation

During analysis, a bean (type T) was found to have over 10 000 instances (~20 MB) retained by an anonymous inner‑class listener that never released references after a timeout.

public void doSmthing(T t) {<br/>    redis.addListener(new Listener(){<br/>        public void onTimeout(){<br/>            if(t.success()){ /* do work */ }<br/>        }<br/>    });<br/>}

Fixing the listener leak reduced some memory pressure but did not stop server restarts.

Further Leak Detection

Heap dumps later revealed thousands of ByteArrowRow objects (≈40 k) originating from massive database queries. An unexpected traffic spike (≈83 MB/s) was observed, but cloud provider confirmed it was normal traffic.

The root cause turned out to be a missing module condition in a query, causing a full table scan of over 400 k rows, which saturated memory and triggered restarts.

Second Optimization – Metaspace & GC Tuning

GC logs showed FullGC occurring even when old‑gen usage was below 30 %. Research indicated metaspace growth could trigger FullGC. The default metaspace (21 MB) had expanded to ~200 MB.

-Xmn350M -> -Xmn800M<br/>-Xms1000M -> 1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75

-Xmn350M -> -Xmn600M<br/>-Xms1000M -> 1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75

Four servers were compared (prod1‑prod4). The two servers with larger young generation (prod1, prod2) showed dramatically lower FullGC and YoungGC counts, higher thread start counts, and overall better throughput.

Final Results

After the second round of tuning, FullGC frequency dropped to less than one per day, YoungGC frequency halved, and overall throughput increased noticeably on prod1. The single FullGC observed on prod1 was explained by a brief metaspace spike.

Conclusion

FullGC occurring more than once per day is abnormal.

When FullGC spikes, investigate memory leaks first.

After fixing leaks, JVM tuning opportunities are limited; avoid over‑optimizing.

If CPU stays high after code checks, consult the cloud provider—hardware issues can cause 100 % CPU.

High inbound traffic may stem from inefficient database queries; verify query conditions.

Regularly monitor GC metrics to catch problems early.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java JVM Performance Optimization memory-leak GC Tuning

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.