Why Did Our Java Service Trigger Full GC? Uncovering Log4j2’s Hidden Memory Leak

After a recent deployment, a Java service began experiencing over five Full GC events per minute, traced to Log4j2’s thread‑local buffer misconfiguration and a -XX:PretenureSizeThreshold setting that forced large StringBuilder objects directly into the old generation, leading to memory pressure and frequent Full GCs.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Why Did Our Java Service Trigger Full GC? Uncovering Log4j2’s Hidden Memory Leak

Summary

This article records a production incident where a Java service started generating frequent Full GC events. The root cause was a combination of Log4j2 configuration, JVM parameters, and a newly introduced Servlet dependency that disabled Log4j2’s thread‑local cache, causing large StringBuilder objects to be allocated directly in the old generation.

1. Incident Origin

Shortly after a nightly deployment, operations reported that Full GC alerts appeared, with more than five Full GC occurrences per minute. The affected system uses Log4j2 version 2.31.8.

1.1 JVM Monitoring Dashboard

The Full GC frequency spiked immediately after the release, indicating a direct correlation with the new version. A rollback was considered while a memory dump was collected for further analysis.

2. GC Log Analysis

GC logs reveal: 82 Minor GC events and 105 Full GC events.

High Minor GC frequency shows frequent short‑lived object creation.

Full GC frequency exceeds Minor GC, indicating severe memory allocation imbalance:

AI‑generated table summarises memory usage before and after GC:

Memory Region

Before GC

After GC

Trend

Eden

100% (3.13GB full)

30% (≈0.94GB)

↓70%

Survivor

0% (idle)

0% (idle)

inactive

Old (CMS)

96.2% (1.84GB/1.87GB)

33.3% (0.65GB/1.87GB)

↓63% (still high)

Key observation: after a Young GC the old generation still holds 0.65 GB, a typical sign of memory leak or improper promotion.

2. Investigation

AI analysis suggested that large objects were being created and directly promoted to the old generation, prompting a deeper root‑cause investigation.

2.1 Review Deployment Changes

The release removed the old Dubbo dependency and introduced an internal xxx‑rpc framework. No business code changes were made. Initial suspicion fell on the new RPC framework, but it is widely used without issues, so the code diff did not reveal the problem.

2.2 Memory Dump Analysis

Memory snapshots were examined. Objects were filtered by package name; the count of objects seemed normal. Sorting by class count highlighted char[] arrays and String objects as the most numerous and memory‑heavy, pointing to extensive string handling (e.g., logging, JSON serialization).

Further inspection of incoming references showed many StringBuilder instances with a fixed buffer size of over 2 MB.

The log framework creates a large StringBuilder (capacity defined by MAX_REUSABLE_MESSAGE_SIZE) for each log call, filling the buffer with empty characters to reach the preset size (1 MB). This explains the massive patterns seen in the heap dump.

3. JVM Configuration Review

The JVM was configured with -XX:PretenureSizeThreshold=2097152, meaning any object larger than 2 MB is allocated directly in the old generation. Combined with the Log4j2 buffer size, this forced many log‑related objects into the old generation, triggering frequent Full GCs.

AI recommendation: reconsider the necessity of this aggressive threshold; removing it allows the JVM to allocate large objects in the young generation first.

3. Code Trace

Using MAT, the investigation narrowed to Log4j2 usage. The code search revealed the @Log4j2 annotation and the underlying Log4j2 source where a ThreadLocal<FormatBufferHolder> is created based on the log4j2.enable.threadlocals flag and the isWebApp detection.

final ThreadLocal<FormatBufferHolder> FORMAT_BUFFER_HOLDER_REF =
    Constants.ENABLE_THREADLOCALS ? ThreadLocal.withInitial(FormatBufferHolder::new) : null;

public static final boolean ENABLE_THREADLOCALS =
    !IS_WEB_APP && PropertiesUtil.getProperties().getBooleanProperty("log4j2.enable.threadlocals", true);

public static final boolean IS_WEB_APP = PropertiesUtil.getProperties().getBooleanProperty(
    "log4j2.is.webapp", isClassAvailable("javax.servlet.Servlet") || isClassAvailable("jakarta.servlet.Servlet"));

If log4j2.enable.threadlocals is false or the application is identified as a web app, the thread‑local buffer is disabled, causing each log call to allocate a new large StringBuilder.

4. Principle Explanation

When log4j2.enable.threadlocals=true, each thread reuses a pre‑allocated StringBuilder buffer, reducing short‑lived object creation and GC pressure. The buffer size is controlled by log4j2.maxReusableMsgSize. In a web‑app context, Log4j2 disables this cache to avoid class‑loader memory leaks caused by lingering ThreadLocal references after redeployment.

5. Solution

The fix is straightforward: adjust Log4j2 configuration to force isWebApp=false and reduce maxReusableMsgSize (e.g., back to the default 0.5 KB). This prevents the creation of 2 MB buffers on every log call.

4. Post‑mortem

Comparison of the faulty system with a normal system shows:

Config Item

Normal System

Faulty System

Effect log4j2.maxReusableMsgSize default (0.5 KB)

explicitly set to 1 MB

different object size -XX:PretenureSizeThreshold default (0)

explicitly set to 2 MB

large‑object allocation strategy javax.servlet.Servlet dependency

present (via internal RPC framework)

present (via internal RPC framework)

caused isWebApp mis‑detection

Final Effect

small objects reclaimed in young gen

large objects promoted to old gen

frequent Full GC

Additional tables compare Log4j2 versions (~2.17.0 vs 2.23.1) and Logback vs Log4j2, highlighting allocation strategies, performance, garbage‑collection friendliness, configuration flexibility, and security considerations.

6. Q&A

Is the isWebApp check a bug? No; it is a deliberate safety measure to prevent memory leaks in traditional web containers where threads are reused across deployments. In Spring Boot’s embedded container, the risk is minimal, so disabling the check with -Dlog4j2.is.webapp=false is safe.

Overall, the incident was not a Log4j2 bug but a side effect of high‑performance design interacting with specific configuration and an unintended web‑app detection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMPerformanceMemory LeakgcLog4j2
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.