Why ThreadLocal Leaks Cause Full GC in Thread Pools and How to Fix It

An online service experienced frequent Full GC due to a ThreadLocal memory leak in a thread‑pool scenario; the article walks through GC log analysis, heap dump inspection, root‑cause discovery in ThreadFactory logic, and presents a robust fix using Spring’s TaskDecorator to ensure proper cleanup.

WeiLi Technology Team
WeiLi Technology Team
WeiLi Technology Team
Why ThreadLocal Leaks Cause Full GC in Thread Pools and How to Fix It

1. Background

During a routine production‑environment inspection, the GC metrics of the bill service showed a sharp increase in Full GC frequency that did not converge, indicating a potential memory issue.

GC Log Analysis

Using GCeasy, the Old Generation memory usage was observed to climb steadily from ~300 MB to nearly 500 MB, with before‑GC and after‑GC curves almost overlapping, meaning the GC was not reclaiming memory. This pattern is a classic sign of a memory leak: after a Full GC the Old Generation should shrink noticeably, but it stayed flat, implying many objects were still strongly referenced.

The upward trend in the Old Generation confirms that leaked objects accumulate there because they survive repeated Minor GCs and are eventually promoted.

2. Heap Dump Analysis with MAT

A heap dump ( .hprof) was taken from the production environment and opened in Eclipse Memory Analyzer (MAT). The Dominator Tree revealed that several thread objects each held a large number of PriceVO$SkuPriceVO instances, all attached to a ThreadLocal.

Expanded reference chain for one thread:

java.lang.Thread
  └── threadLocals: java.lang.ThreadLocal$ThreadLocalMap
        └── table: java.lang.ThreadLocal$ThreadLocalMap$Entry[]
              └── [n]: Entry
                    └── value: java.util.concurrent.ConcurrentHashMap
                          └── key: String (cacheKey)
                          └── value: List<PriceVO.SkuPriceVO>   // massive leak objects

The problem became clear: objects cached in ThreadLocal were never cleared, and as requests kept coming, the cache grew until it exhausted the Old Generation, triggering repeated Full GCs.

Why ThreadLocal Leaks Occur

Each Thread holds a ThreadLocalMap, a hash table keyed by a ThreadLocal instance (weak reference) and valued by the actual stored object (strong reference). In a simple request‑per‑thread model, the thread dies and the entire map is reclaimed. In a thread‑pool, however, threads are reused; the map lives as long as the thread does, so values are never collected.

If code writes data into ThreadLocal for each task but never clears it, the values accumulate in the thread’s map, eventually exhausting the Old Generation and causing a Full GC loop.

3. Code Review: Finding the Write Point and the Ineffective Cleanup

The offending write occurs in a price‑filter method:

public List<PriceVO.SkuPriceVO> listFilterNoCost(Long customerId, Long warehouseId, Long corpId, List<...> skuUnitList) {
    // Try to get from ThreadLocal cache first
    String cacheKey = cacheHashCode(dto);
    Collection<PriceVO.SkuPriceVO> avgPriceList = ScopeCacheUtil.get(cacheKey);
    if (CollectionUtils.isEmpty(avgPriceList)) {
        avgPriceList = listSkuAvgPrice(dto);
        ScopeCacheUtil.put(cacheKey, avgPriceList);  // write to ThreadLocal
    }
    // ... filtering logic
}

The intention is to cache price results per request, but the cache must be cleared after the task finishes.

The thread‑pool configuration includes a cleanup call in a custom ThreadFactory:

public ThreadPoolTaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setThreadFactory(r -> {
        Thread thread = new Thread(new ContextRelatedRunnable() {
            @Override
            public void doRun() {
                r.run();
                ScopeCacheUtil.clearContext();  // clear ThreadLocal
            }
        });
        // ...
        return thread;
    });
    return executor;
}

Debugging revealed that clearContext() never executed because the r passed to the factory is not the business Runnable but a ThreadPoolExecutor$Worker that runs an internal loop. Consequently, the cleanup line is effectively dead code.

4. Root Cause: Misunderstanding ThreadFactory vs. Task Execution

The ThreadFactory creates the thread itself; the Worker object it receives is the thread’s own run loop, not the submitted task. The worker’s run() method enters a perpetual runWorker(this) loop, so any code placed after r.run() in the factory’s doRun() never gets a chance to execute.

Thus the cleanup logic was attached to the wrong lifecycle hook.

5. Solution: Use Spring’s TaskDecorator

Instead of trying to clean up in the thread‑creation phase, decorate each submitted task so that cleanup runs after the task finishes, regardless of success or exception.

public ThreadPoolTaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    // ... basic pool config
    executor.setTaskDecorator(runnable -> {
        return () -> {
            try {
                runnable.run();
            } finally {
                ScopeCacheUtil.clearContext(); // always clean ThreadLocal
            }
        };
    });
    return executor;
}

The decorator wraps the original Runnable in a new lambda that guarantees clearContext() runs in a finally block. After applying this change, debugging shows the runtime type of the wrapped task is CompletableFuture$AsyncRun, the actual business task, and the cleanup executes correctly, stopping the memory leak.

6. Extended Thoughts

6.1 ThreadFactory vs. TaskDecorator: Responsibility Boundary

ThreadFactory : invoked when a thread is created; scope is thread‑level; suitable for setting thread name, priority, UncaughtExceptionHandler, etc.

TaskDecorator : invoked when a task is submitted; scope is task‑level; suitable for MDC propagation, ThreadLocal cleanup, tracing context, etc.

In short, use ThreadFactory for thread‑wide settings and TaskDecorator for per‑task hooks.

6.2 Best Practices for ThreadLocal in Thread Pools

Always clean up in a finally block, e.g., ThreadLocal.remove().

Prefer a TaskDecorator as a safety net even if the business code attempts cleanup.

Consider alternatives such as passing data via method parameters or using TransmittableThreadLocal (Alibaba open‑source) for cross‑thread‑pool propagation.

Monitor GC logs and heap trends regularly to catch leaks early.

6.3 General Steps for Diagnosing Memory Leaks

1. Detect anomaly
   └── Monitoring alert / inspection shows frequent Full GC
2. Confirm leak
   └── Analyze GC logs (GCeasy / GCViewer)
   └── Observe Old Generation trend: does memory drop after GC?
3. Locate leaking objects
   └── jmap -dump to export heap snapshot
   └── Analyze with MAT: Leak Suspects / Dominator Tree / Histogram
   └── Identify largest objects and their GC‑root reference chains
4. Code review
   └── Trace reference chain back to write point
   └── Verify cleanup logic exists and actually runs
5. Fix & verify
   └── Reproduce locally, debug
   └── After fix, monitor memory trend for recovery

7. Summary

The root cause was a ThreadLocal that was never cleared because the cleanup code was placed in a ThreadFactory where it never executed. By moving the cleanup to a Spring TaskDecorator, each task now reliably clears its ThreadLocal, eliminating the memory leak and the associated Full GC storms.

If you also combine thread pools with ThreadLocal, double‑check whether your cleanup resides in the ThreadFactory or the TaskDecorator—otherwise you may end up with code that never runs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaspringmemory leakThreadLocalgcTaskDecorator
WeiLi Technology Team
Written by

WeiLi Technology Team

Practicing data-driven principles and believing technology can change the world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.