Backend Development 16 min read

Why Netty’s Direct Memory Stalls on JDK 17: A Deep Dive into Low‑Latency Bottlenecks

An in‑depth analysis of the Tianwang risk‑control Lingji system reveals how JDK 17’s ZGC, Netty’s direct‑memory allocation, and cross‑data‑center channel limits caused severe latency spikes, memory growth, and CPU usage, and outlines the debugging steps and configuration changes that finally resolved the issue.

JD Cloud Developers

Sep 12, 2023

Why Netty’s Direct Memory Stalls on JDK 17: A Deep Dive into Low‑Latency Bottlenecks

Background

The Tianwang risk‑control Lingji system is an online computation service built on in‑memory computing, offering high‑throughput, low‑latency statistics (count, distinctCount, max, min, avg, sum, std, and range distribution) within sliding or tumbling windows. Client and server communicate via Netty over TCP, and the server replicates data to slave clusters.

Low‑Latency Bottleneck

After extensive optimization, version 1 of Lingji achieved high throughput, but with a 10 ms client timeout and 10 k qps per core, availability dropped to ~98.9 % due to GC pauses. Using a CMS collector on an 8‑core, 16 GB machine, version 2 reached >200 k qps, yet a GC every ~4 seconds (≈30 ms) still limited minute‑level throughput to ~30 ms, failing business latency requirements.

Problem

Switching to JDK 17+ZGC reduced GC pauses to microseconds, but a special cross‑data‑center test (Beijing ↔ Suqian) exposed odd behavior:

Server container memory surged and decreased very slowly after the test stopped.

CPU remained around 20 % despite no incoming traffic.

GC occurred roughly every 10 seconds.

Memory Leak Investigation

Initial suspicion of a memory leak led to heap dumps showing Netty‑related objects. Enabling Netty’s strict leak detection (Dio.netty.leakDetection.level=PARANOID) produced no leak logs, suggesting the issue was not a classic Netty memory leak.

JDK and Netty Version Bug Check

Testing with JDK 8 eliminated the problem, indicating a compatibility issue with JDK 17. Upgrading to JDK 17.0.8 (which contains several bug fixes) and trying newer Netty versions still did not resolve the issue, and a related Netty GitHub issue suggested a possible fix in later releases.

Root Cause Identification and Solution

Further investigation revealed:

Rollback to JDK 8 reduced the backup data volume received by the Suqian cluster.

High CPU was caused by frequent GC.

Netty’s MpscUnboundedArrayQueue held many WriteTask objects, inflating memory usage.

The issue only appeared when syncing across data centers.

Analysis indicated that larger inter‑data‑center latency exceeded the capacity of a single Netty channel, causing the event‑loop to fall behind and backlog write tasks.

Solution: increase the number of channel connections to backup nodes, use a connection pool, and randomly select a live channel for each batch sync. After this change the problem disappeared.

Root Cause Analysis

Even after the above fix, the fundamental cause remained unclear, prompting three questions:

Why does memory decrease only slowly after the test stops if the event‑loop consumption is insufficient?

Why does CPU stay at ~23 % when the sync operation should consume only ~5 %?

Why does JDK 8 not exhibit the problem?

Investigation uncovered a blocked direct‑buffer constructor. The debug log showed:

[2023-08-23 11:16:16.163] DEBUG [] - io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable: Reflective setAccessible(true) disabled

This indicated that Netty could not use the unsafe direct‑buffer constructor, forcing it to fall back to ByteBuffer.allocateDirect, which triggers synchronous GC when direct memory is exhausted.

Source Code Analysis

Netty’s default PooledByteBufAllocator allocates direct memory via PoolArena.DirectArena#newChunk:

protected PoolChunk<ByteBuffer> newChunk() {
    ByteBuffer memory = allocateDirect(chunkSize);
}

The allocateDirect method chooses between unsafe allocation and ByteBuffer.allocateDirect based on PlatformDependent.useDirectBufferNoCleaner():

PlatformDependent.useDirectBufferNoCleaner() ?
    PlatformDependent.allocateDirectNoCleaner(capacity) :
    ByteBuffer.allocateDirect(capacity);

Enabling useDirectBufferNoCleaner requires two JVM flags: -io.netty.tryReflectionSetAccessible (JDK 9+)

Access to the private DirectByteBuffer(long, int) constructor, which may need --add-opens=java.base/java.nio=ALL-UNNAMED on modular JDKs.

Without these flags, Netty falls back to ByteBuffer.allocateDirect, which, when direct memory is scarce, invokes System.gc() and waits synchronously, causing pauses that can exceed one second on JDK 17.

static void reserveMemory(long size, long cap) {
    // ... may call System.gc() and wait for reference processing ...
}

In JDK 8 the private constructor is available, so allocateDirectNoCleaner is used, avoiding the GC‑induced stalls.

Reflection on Slow Diagnosis

The sync process lacked low/high water‑mark checks (e.g., socketChannel.isWritable()) to detect back‑pressure early.

Write‑and‑flush calls did not attach listeners for OutOfMemoryError, missing early failure signals.

Non‑heap memory metrics displayed by monitoring tools did not match actual direct‑memory usage, obscuring the memory‑limit condition.

Other middleware that shades Netty (e.g., UMP, Titan) may encounter similar issues if JVM flags are not set.

ChannelFuture writeAndFlush(Object msg)
ChannelFuture writeAndFlush(Object msg, ChannelPromise promise)

Summary Diagram

Direct Cause

Cross‑data‑center synchronization over a single channel cannot meet throughput, causing TCP back‑pressure. Netty’s event‑loop WriteTask backlog grows, and unflushed entries accumulate in ChannelOutboundBuffer#unflushedEntry, leading to memory explosion.

Fundamental Root Cause

On newer JDKs Netty requires the JVM flags -add-opens=java.base/java.nio=ALL-UNNAMED and -io.netty.tryReflectionSetAccessible to enable unsafe direct‑memory allocation. Without them, allocation falls back to ByteBuffer.allocateDirect, which blocks on GC when direct memory limits are reached, causing massive WriteTask stalls and memory growth.

Reflection on Slow Diagnosis

Missing water‑mark checks prevented early detection of channel writability limits.

Lack of error listeners on writeAndFlush delayed OutOfMemoryError visibility.

Inconsistent monitoring of non‑heap vs. actual direct memory masked the memory‑limit condition.

Other Netty‑based components (e.g., UMP, Titan) may face similar issues if JVM flags are omitted.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Netty low-latency direct memory memory-leak

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.