Backend Development 12 min read

JVM Garbage Collection Tuning Experience: Reducing FullGC Frequency and Solving Memory Leaks

Over a month of systematic JVM tuning, the author reduced FullGC frequency from 40 times per day to once every ten days, halved YoungGC time, identified and fixed a memory leak caused by anonymous inner‑class listeners, and documented the step‑by‑step optimization process with configuration changes and performance results.

Top Architect

Feb 20, 2024

JVM Garbage Collection Tuning Experience: Reducing FullGC Frequency and Solving Memory Leaks

In this article the author, a senior architect, shares a month‑long experience of optimizing JVM garbage collection on a four‑node production cluster (2 CPU / 4 GB each) that suffered from frequent FullGC (≈40 times per day) and occasional server restarts.

Problem : Excessive FullGC and YoungGC caused high latency and instability. Initial GC logs showed very frequent FullGC and long YoungGC pauses.

Initial JVM parameters (per node):

-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

Key explanations of the flags were listed, e.g., -Xmx1800M sets the maximum heap, -Xmn350M defines the young generation size, etc.

First Optimization

The author increased the young generation to 800 MB, set -Xms equal to -Xmx, and changed -XX:SurvivorRatio from 4 to 8:

-Xmn350M -> -Xmn800M<br/>-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8<br/>-Xms1000M -> -Xms1800M

After deploying these changes to two servers (prod, prod2) for five days, YoungGC frequency dropped by more than 50% and its pause time decreased by ~400 s, but FullGC count unexpectedly increased by 41, indicating a failed optimization.

Second Optimization – Memory Leak Investigation

During analysis a class T was found to have >10 000 instances (~20 MB) due to an anonymous inner‑class listener that retained references and never released them, causing a memory leak.

public void doSmthing(T t){
  redis.addListener(new Listener(){
    public void onTimeout(){
      if(t.success()){
        // execute operation
      }
    }
  });
}

Fixing the leak reduced the overall memory pressure, but FullGC remained high. Further investigation revealed a query that unintentionally fetched >400 000 rows because a module condition was missing, leading to massive object creation (≈40 000 ByteArrowRow objects) and occasional spikes in inbound traffic.

Second Optimization – Metaspace and CMS Tuning

Observing that Metaspace grew to ~200 MB (far above the default 21 MB), the author added the following parameters to two servers (prod1, prod2) while keeping the other two unchanged:

-Xmn350M -> -Xmn800M<br/>-Xms1000M -> -Xms1800M<br/>-XX:MetaspaceSize=200M<br/>-XX:CMSInitiatingOccupancyFraction=75

Another set for prod3 and prod4 used -Xmn600M instead of 800 M. After ~10 days the results showed:

FullGC frequency on prod1 and prod2 was dramatically lower than on prod3 and prod4.

YoungGC frequency on prod1/2 was about half of that on prod3/4.

Throughput (thread start count) on prod1 increased by roughly one day’s worth of work compared to the others.

Overall, the optimization succeeded: FullGC occurrences were cut to only five times in three days when using the original parameters, and the server’s throughput and GC pause times improved significantly.

Summary of Findings

FullGC more than once per day is abnormal.

When FullGC spikes, first investigate memory leaks.

After fixing leaks, JVM tuning opportunities become limited; focus on critical issues.

High CPU may stem from server‑level problems; consult cloud provider if needed.

Unexpected inbound traffic can indicate hidden database queries; verify query conditions.

Regularly monitor GC logs to detect issues early.

The article also contains promotional material for a ChatGPT‑focused community and various unrelated advertisements, which are omitted from the technical summary.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java JVM Garbage Collection Performance Tuning memory-leak

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.