JVM GC Optimization: Reducing FullGC Frequency and Resolving Memory Leaks

This article documents a month‑long JVM garbage‑collection tuning effort that lowered FullGC from around 40 daily occurrences to one every ten days, halved YoungGC time, identified and fixed a memory‑leak caused by anonymous inner‑class listeners, and refined heap and metaspace settings to improve overall server throughput.

Java Captain
Java Captain
Java Captain
JVM GC Optimization: Reducing FullGC Frequency and Resolving Memory Leaks

After more than a month of effort, FullGC frequency was reduced from about 40 times per day to roughly once every ten days, and YoungGC time was cut by more than half, prompting a detailed record of the tuning process.

Initially the production servers (2 CPU, 4 GB RAM, four‑node cluster) suffered frequent FullGC and occasional automatic restarts. The original JVM startup parameters were:

-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

-Xmx1800M sets the maximum heap size to 1800 MB.

-Xms1000M sets the initial heap size to 1000 MB; matching it to -Xmx avoids re‑allocation after each GC.

-Xmn350M defines a 350 MB young generation; increasing it reduces the old generation size. Sun recommends roughly 3/8 of the total heap.

-Xss300K sets each thread stack to 300 KB (modern JDK defaults to 1 MB).

First Optimization

The young generation was deemed too small, causing frequent YoungGC pauses (up to 830 s). The initial heap size was also mismatched. The first online tuning increased the young generation and aligned the initial heap with the maximum:

-Xmn350M -> -Xmn800M
-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8
-Xms1000M -> -Xms1800M

After five days, YoungGC count dropped by more than half and its duration fell by 400 s, but FullGC count unexpectedly rose by 41, indicating a failed first attempt.

Second Optimization – Memory‑Leak Investigation

During investigation a bean of type T was found with over ten thousand instances occupying ~20 MB, caused by an anonymous inner‑class listener that retained the outer object after a timeout callback:

public void doSmthing(T t){
  redis.addListener(new Listener(){
    public void onTimeout(){
      if(t.success()){
        // execute operation
      }
    }
  });
}

The listener was never released, leading to a memory leak and server restarts. After clearing related error logs and redeploying, GC behavior remained largely unchanged, showing the root cause was still unresolved.

Memory‑Leak Investigation Continued

Further heap dumps revealed a massive number of ByteArrowRow objects (over 40 k), likely generated by a database query that unintentionally fetched all unprocessed rows due to a missing module filter. This caused a sudden traffic spike of 83 MB/s, though no such load existed in production.

Fixing the query condition eliminated the leak; the servers returned to normal operation with the original JVM parameters, experiencing only five FullGC events over three days.

Second Tuning After Leak Fix

With the leak resolved, further tuning focused on reducing unnecessary FullGCs. GC logs showed FullGC occurring even when the old generation occupied less than 30 % of heap. Research indicated metaspace growth could trigger FullGC. The metaspace size had swelled to ~200 MB (default 21 MB). The following adjustments were applied (different for prod1/prod2 vs. prod3/prod4):

-Xmn350M -> -Xmn800M
-Xms1000M -> 1800M
-XX:MetaspaceSize=200M
-XX:CMSInitiatingOccupancyFraction=75

and for prod3/prod4:

-Xmn350M -> -Xmn600M
-Xms1000M -> 1800M
-XX:MetaspaceSize=200M
-XX:CMSInitiatingOccupancyFraction=75

After ten days of observation, prod1 and prod2 (with larger young generations) showed dramatically lower FullGC counts and about half the YoungGC frequency compared to prod3 and prod4. Throughput on prod1 increased noticeably, confirming the success of the optimization.

Summary

FullGC occurring more than once per day is abnormal.

When FullGC spikes, prioritize checking for memory leaks.

After fixing leaks, JVM tuning opportunities become limited; further investment may not be worthwhile.

If CPU stays high after code review, consult operations – in this case a server issue caused 100 % CPU.

Unexpected high inbound traffic often originates from database queries; verify query conditions.

Regularly monitor GC to detect problems early.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaJVMGarbage Collectionperformance tuningmemory leak
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.