Why Does Your Flink Job OOM? Uncovering JVM Non‑Heap, Direct Memory and NMT Secrets
This article explains the meaning of JVM Non‑Heap metrics, clarifies why Direct/Mapped memory is not part of Non‑Heap, analyzes a Flink taskmanager OOM case, and demonstrates how Native Memory Tracking (NMT) can reveal hidden memory gaps caused by allocator strategies such as TCMalloc and PTMalloc.
Background
A user asked about the specific meaning of various MBean memory metrics collected from the JVM, especially the relationship between Non‑Heap , direct and mapped memory.
Non‑Heap vs Direct/Mapped
Non‑Heap refers to the memory used by the JVM outside the Java heap. The direct and mapped values come from the NIO library classes DirectByteBuffer and MappedByteBuffer , which track off‑heap memory used by those buffers. They belong to the Non‑Heap category, but they are not represented as separate MemoryPool objects.
<code>public:
MemoryPool(const char* name, PoolType type, size_t init_size, size_t max_size,
bool support_usage_threshold, bool support_gc_threshold);
const char* name() { return _name; }
bool is_heap() { return _type == Heap; }
bool is_non_heap() { return _type == NonHeap; }
</code>The JVM MemoryPool implementation shows which pools are classified as NonHeap (e.g., CodeHeapPool , MetaspacePool , CompressedKlassSpacePool , etc.). The JMX getNonHeapMemoryUsage aggregates the usage of all NonHeap pools.
Direct/Mapped Not a Real MemoryPool
Source code inspection reveals that direct and mapped are not actual MemoryPool instances; they are simple BufferPool implementations that report statistics from DirectByteBuffer / MappedByteBuffer and are limited by MaxDirectMemorySize . They do not hold or manage memory themselves.
<code>static final BufferPool BUFFER_POOL = new BufferPool() {
@Override public String getName() { return "direct"; }
@Override public long getCount() { return Bits.COUNT.get(); }
@Override public long getTotalCapacity() { return Bits.TOTAL_CAPACITY.get(); }
@Override public long getMemoryUsed() { return Bits.RESERVED_MEMORY.get(); }
};
</code>Consequently, memory allocated via Unsafe (e.g., Flink’s off‑heap unsafe memory) is not counted in the direct pool.
Flink OOM Case Study
A Flink taskmanager running in Kubernetes experienced OOM kills. The JVM was configured with -Xmx10G and -XX:MaxDirectMemorySize=15G . Monitoring showed RSS growing far beyond the sum of reported heap, non‑heap, direct and mapped usage.
<code>-XX:+AlwaysPreTouch
-XX:CompressedClassSpaceSize=260046848
-XX:ErrorFile=/var/log/hs_err_%pid.log
-XX:GCLogFileSize=524288000
-XX:InitialHeapSize=10871635968
-XX:MaxDirectMemorySize=15166603264
-XX:MaxHeapSize=10871635968
-XX:MaxMetaspaceSize=268435456
-XX:MaxNewSize=3623878656
-XX:MinHeapDeltaBytes=524288
-XX:NewSize=3623878656
-XX:NumberOfGCLogFiles=5
-XX:OldSize=7247757312 -XX:+PrintGC
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseCompressedClassPointers
-XX:+UseCompressedOops
-XX:+UseFastUnorderedTimeStamps
-XX:+UseGCLogFileRotation
-XX:+UseParallelGC
</code>Memory trend graphs (shown below) illustrate that direct memory usage fluctuates while RSS continuously rises until the container is killed.
Key Observations
Non‑Heap aggregates several internal MemoryPool objects, including MetaspacePool .
Direct/Mapped metrics only reflect DirectByteBuffer / MappedByteBuffer usage, bounded by MaxDirectMemorySize .
Unsafe‑allocated memory is not recorded in the direct pool.
Native Memory Tracking (NMT)
NMT is a built‑in JVM tool that categorises native memory allocations. Enable it with -XX:NativeMemoryTracking=summary or detail and query via jcmd <pid> VM.native_memory .
<code>-XX:NativeMemoryTracking=summary or -XX:NativeMemoryTracking=detail
jcmd <pid> VM.native_memory [summary|detail|baseline|summary.diff|detail.diff|shutdown] [scale=KB|MB|GB]
</code>Sample output (truncated) shows reserved vs committed memory for categories such as Java Heap, Class, Thread, Code, GC, Internal, etc.
Reserved vs Committed vs RSS
Reserved memory is allocated via mmap with PROT_NONE ; committed memory uses PROT_READ|PROT_WRITE . Committed memory becomes part of RSS only after page‑fault allocation. RSS also includes kernel buffers and swapped‑out pages.
Why the Gap?
In the Flink case the gap (~13 GB) between NMT committed memory (~26 GB) and RSS (~32 GB) was traced to allocator behaviour:
TCMalloc : uses madvise(MADV_FREE) (LazyFree). Freed pages stay reserved until the kernel needs memory, causing a large apparent gap.
PTMalloc (glibc) : creates per‑thread arenas when the main arena is contended. Each arena consumes ~128 MB; many threads can inflate memory usage.
Setting MALLOC_ARENA_MAX=1 disables extra arenas, eliminating the gap, though at a performance cost. A compromise of MALLOC_ARENA_MAX=4 is often used.
Conclusions
JVM off‑heap memory management is complex and heavily influenced by the native memory allocator. Direct/Mapped buffers are independent of the NonHeap MemoryPool aggregation. NMT provides fine‑grained visibility, but differences between committed memory and RSS can arise from allocator strategies such as TCMalloc’s LazyFree or glibc’s thread arenas. Understanding these mechanisms helps operators diagnose OOM issues in big‑data frameworks like Flink and size memory allocations appropriately.
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.