Java Performance Tuning: Practical Guide to Detecting and Fixing Memory Leaks
This article explains how to differentiate memory leaks from out‑of‑memory errors, identifies classic GC‑based leak signals, introduces a toolchain (jstat, jmap, MAT, Arthas, JProfiler), walks through a step‑by‑step investigation workflow, lists common leak patterns, presents a real‑world ThreadLocal leak case, and offers preventive measures such as monitoring, regular heap dumps, code review, and stress testing.
1. Memory Leak vs Out‑Of‑Memory
Two concepts are clarified:
Memory Leak : Objects become unreachable but the garbage collector cannot reclaim them, causing a gradual increase in heap usage until an OOM occurs.
Out‑Of‑Memory (OOM) : The heap is exhausted and new object allocation fails instantly.
The relationship is: Leak → gradual memory consumption → eventual OOM .
2. Typical Leak Indicators
Key GC log patterns that suggest a leak:
After a Full GC, heap usage hardly drops (old‑gen usage > 90%).
Full GC frequency keeps rising.
Each Full GC leaves a higher memory footprint than the previous one.
Eventually the process crashes with OOM.
3. Toolchain for Leak Diagnosis
jstat – Real‑time GC monitoring to confirm a leak.
jmap – Generates a heap dump for offline analysis.
MAT (Memory Analyzer Tool) – Analyzes the dump and pinpoints leaking objects.
Arthas – Online diagnostics; can view object distribution in production.
JProfiler – Visual tool for both online and offline profiling.
4. Investigation Steps
Step 1 – Observe GC in Real Time
# Check GC every second
jstat -gcutil <PID> 1000Sample output for a leaking JVM shows the old‑gen (O) staying above 95% and Full GC time (FGCT) increasing, indicating memory cannot be reclaimed.
Step 2 – Export a Heap Dump
# Live dump (triggers Full GC – use with caution)
jmap -dump:live,format=b,file=heap.hprof <PID>In production, prefer a non‑live dump to avoid Full GC:
# Safer dump
jmap -dump:format=b,file=heap.hprof <PID>Step 3 – Analyze Dump with MAT
Open MAT.
Load heap.hprof.
Open the Leak Suspects report.
MAT automatically lists the most probable leak source, e.g. a class com.example.service.OrderService occupying 78.5% of the heap with over a million Order instances.
Step 4 – Locate the Root Cause
Use the Dominator Tree view to find the biggest retaining objects. In the example, a HashMap$Node[] holds 1,234,567 Order objects, revealing the leak origin.
5. Common Leak Patterns and Solutions
Uncleared Collections : Adding data to List/Map without removal – use a cache with expiration (e.g., Caffeine).
Listeners/Callbacks Not Unregistered : Forgetting to deregister – clean up in a destroy method.
ThreadLocal Not Cleared : Thread pools retain ThreadLocal values – remove in a finally block.
Unclosed Streams/DB Connections : Use try‑with‑resources to ensure closure.
Static Collections Holding Objects : Avoid or replace with WeakHashMap.
Third‑Party Library Leaks : Upgrade the library or apply a known workaround.
6. Real‑World Case: ThreadLocal Leak Causing OOM
Symptom: The application crashes with OOM after three days, recovers after restart, and repeats.
Investigation:
Run jstat -gcutil – old‑gen usage climbs continuously.
Export a heap dump with jmap -dump.
MAT shows many ThreadLocal entries holding user objects.
Root Cause Code:
public class UserContext {
private static ThreadLocal<User> currentUser = new ThreadLocal<>();
public void setUser(User user) { currentUser.set(user); }
}Threads in Tomcat’s pool never die, so the ThreadLocal references persist.
Fix: Clear the ThreadLocal after use.
public void processRequest(User user) {
try {
UserContext.setUser(user);
// business logic
} finally {
UserContext.clear(); // ensure removal
}
}7. Preventive Measures
Monitoring Alerts : Trigger when old‑gen usage exceeds 80%.
Regular Heap Dumps : Archive a dump weekly for trend analysis.
Code Review Focus : Pay special attention to collections, ThreadLocal, and Stream usage.
Stress Testing : Run 24/7 load tests and observe memory curves.
8. Next Episode Preview
The upcoming article will cover Java thread‑pool tuning, including detailed parameter explanations, production‑grade configurations, monitoring, dynamic adjustments, and classic pitfalls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Coder Trainee
Experienced in Java and Python, we share and learn together. For submissions or collaborations, DM us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
