Operations 11 min read

How to Uncover Hidden Java Memory Leaks in Kubernetes Pods

This article explains why Java applications in cloud containers often encounter OOMKilled pods, details the hidden memory consumption from JNI, libc, and Transparent Huge Pages, and demonstrates step‑by‑step how to use Alibaba Cloud OS Console's memory panorama analysis to identify and mitigate the root causes.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How to Uncover Hidden Java Memory Leaks in Kubernetes Pods

Background

As the automotive industry accelerates its shift toward intelligent, cloud‑native deployments, many workloads migrate from traditional IDC clusters to Kubernetes clusters on the cloud. During this transition, developers frequently encounter pod memory anomalies and OOMKilled events, where the container’s memory usage far exceeds the JVM‑reported usage.

Java Memory Composition

Understanding the memory layout of a Java process is essential for diagnosing these discrepancies. The memory can be divided into two main categories:

JVM Memory

Heap memory – configurable via -Xms / -Xmx, observable through MemoryMXBean.

Off‑heap memory – includes Metaspace, compressed class space, code cache, direct buffers, and thread stacks, each controllable via flags such as -XX:MaxMetaspaceSize, -XX:CompressedClassSpaceSize, -XX:ReservedCodeCacheSize, -XX:MaxDirectMemorySize, and -Xss.

Non‑JVM Memory

JNI native memory – allocated by native libraries (e.g., C/C++ code) through malloc or system calls like brk and mmap.

Java memory composition diagram
Java memory composition diagram

Common Java Memory Black Holes

JNI Memory

JNI memory is often invisible to standard Java monitoring tools. Native libraries such as ZLIB can leak memory if misused, contributing a substantial, untracked portion of the process’s RSS.

Libc Memory

The JVM, written in C++, relies on the system’s libc allocator (e.g., glibc’s ptmalloc). Ptmalloc maintains per‑thread arenas (default 64 MiB each) and caches freed chunks in bins. Under heavy thread counts or fragmented usage, libc may retain memory, causing the process RSS to exceed the JVM’s view.

Each thread can allocate a 64 MiB arena, leading to waste when many threads exist.

Top‑chunk fragmentation prevents timely return of memory to the OS.

Bins cache freed chunks, so memory freed by the JVM remains in libc’s cache.

Libc memory allocation diagram
Libc memory allocation diagram

Transparent Huge Pages (THP)

Linux’s THP mechanism groups 4 KiB pages into 2 MiB huge pages to reduce TLB misses. However, if an application reserves a 2 MiB region but only uses a few kilobytes, the OS still allocates the entire huge page, inflating the RSS.

Diagnostic Process Using Alibaba Cloud OS Console

When a pod approaches its memory limit, trigger the Memory Panorama Analysis from the console. The report shows container RSS, WorkingSet, JVM memory, process‑level memory, anonymous memory, and file‑backed memory.

Inspect the Java memory usage pie chart: the actual process memory exceeds the JVM‑reported usage by ~570 MiB, entirely attributable to JNI memory.

Enable JNI memory profiling to generate a flame graph of native allocations. The flame graph reveals that the C2 compiler’s JIT warm‑up dominates JNI allocations.

Since no sudden memory spikes appear in the pod, use the console’s Java CPU hotspot tracking to capture hotspot stacks during normal and elevated memory periods.

Compare hotspot stacks: both periods show the C2 compiler, but the high‑memory period also includes increased business‑level traffic and heavy reflection usage, which triggers additional JIT compilation.

Memory panorama report
Memory panorama report

Findings

The root cause of the OOM events is the C2 compiler’s JIT process allocating JNI memory, which, combined with glibc’s arena and THP behavior, leads to a significant memory “black hole” that the JVM does not account for.

Mitigation Recommendations

Adjust C2 compiler flags to adopt a more conservative compilation strategy and monitor the impact on memory consumption.

Tune glibc’s MALLOC_TRIM_THRESHOLD_ (or related environment variables) to encourage timely release of cached memory back to the OS.

These steps reduce the hidden native memory footprint and align the container’s RSS with the JVM’s expectations, preventing unexpected OOMKilled incidents.

KubernetesMemory LeakdiagnosticsjniPod OOM
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.