Master JVM Memory Troubleshooting: A Complete Step‑by‑Step Guide
This comprehensive guide presents a systematic, step‑by‑step process for diagnosing JVM memory problems—including heap, Metaspace, DirectMemory, JNI memory, and stack issues—using Alibaba Cloud ARMS, ATP, standard JDK tools, and best‑practice commands to quickly locate root causes and apply effective solutions.
Purpose
Explain the systematic process for diagnosing JVM memory problems, provide a reusable checklist, and invite community contributions.
Basic Principles
Separate conceptual knowledge (placed in the appendix) from practical troubleshooting steps; use Alibaba Cloud PaaS tools or out‑of‑the‑box utilities; present the process as a step‑by‑step flow; leverage GPT for complex analysis.
Scope
Applicable to JDK 8‑11; the core troubleshooting flow is identical across versions.
Step 1 – Receive the Issue
1.1 Collect Basic Information
Identify the symptom (steady high memory, gradual increase, sudden dump), the affected node, recent changes, and monitoring traces.
1.2 Make an Initial Judgment
Business‑only growth → guide to Alibaba Cloud elastic scaling.
No recent changes, periodic growth → investigate scheduled tasks.
Sudden spikes → try to reproduce in a controlled environment.
Slow continuous growth → proceed to Step 2.
Recommend ARMS for monitoring if the customer lacks tools.
1.3 Preserve the Scene
Save heap dumps, JVM start‑up parameters, GC logs, thread stacks, Linux logs, etc.
Heap Dump
<code>#jmap -dump:format=b,file=heap.bin <pid>
#jmap -dump:live,format=b,file=heap.bin <pid>
#jcmd <pid> GC.heap_dump filename=heap.bin
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap.bin</code>JVM Parameters
<code>ps -ef | grep java</code>Example command line shown.
GC Log
<code># Java8 and below
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<path>
# Java9 and above
-Xlog:gc*:<path>:time</code>Thread Stack
<code>jstack <pid> > jstack.log
jcmd <pid> Thread.print > jstack.log</code>Linux OOM‑Killer Log
<code>sudo dmesg | grep -i kill
grep /var/log/kern.log* -ie kill</code>Step 2 – Identify the Source
2.1 Confirm the Process
Use top or ps aux to ensure the OOM originates from the Java process.
2.2 Determine Whether It Is a Memory Leak
Gradual increase does not always mean a leak; sometimes delayed allocation causes growth.
2.3 Analyse Logs
Search application logs for “OutOfMemoryError”, system logs for OOM events, and correlate with ARMS alerts.
2.4 Preliminary Region Judgment
Based on ARMS metrics decide whether the problem lies in the heap, Metaspace, DirectMemory, JNI memory, or stack.
Step 3 – Heap Issues
Use commands such as jstat -gcutil , jmap -heap , jmap -histo , arthas memory , or tools like ATP‑GC analysis, MAT, or online ATP heap analysis.
Step 4 – Off‑Heap Issues
Metaspace
Growth that persists after GC indicates class‑loader leaks; common culprits are Fastjson, Orika, Groovy, CGLIB, and missing -XX:MaxMetaspaceSize .
DirectMemory
Check with -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory detail . Netty leak‑detection levels can be tuned via -Dio.netty.leakDetectionLevel .
JNI Memory
Native allocations are diagnosed with gperftools , pmap , or core dumps; typical sources are unclosed streams or large native buffers.
Stack
Identify StackOverflowError or OutOfMemoryError: unable to create new native thread via thread dumps and core analysis.
Common Linux Commands
Include top , pmap , and their interpretation of CPU, memory, and process fields.
Tools
ATP – Alibaba Cloud memory analysis platform.
ARMS – Application Real‑time Monitoring Service.
MAT – Eclipse Memory Analyzer.
jcmd, jmap, jstack, jstat, jps, jinfo – standard JDK diagnostics.
Arthas – interactive Java troubleshooting.
gperftools – native memory profiling.
References
Numbered reference links are provided in the original article.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.