How to Diagnose and Fix Netty Direct Memory Leaks in an IoT MQTT Broker

This article walks through the root cause analysis of an OutOfDirectMemoryError in a Netty‑based MQTT IoT broker, detailing how to detect off‑heap memory leaks, monitor them with BTrace and custom metrics, and apply proper ByteBuf release and configuration fixes.

Seewo Tech Circle
Seewo Tech Circle
Seewo Tech Circle
How to Diagnose and Fix Netty Direct Memory Leaks in an IoT MQTT Broker

Background

IoT (Internet of Things) is a key component of modern information technology, requiring massive device connections and bidirectional messaging. Implementing an IoT access platform with Netty + MQTT provides a high‑performance, reliable solution.

Netty is an asynchronous, event‑driven network framework built on JDK NIO, enabling fast development of high‑throughput servers and clients.

MQTT (Message Queuing Telemetry Transport) is a lightweight publish/subscribe protocol that runs over TCP/IP, offering real‑time, low‑bandwidth messaging for remote devices.

Our company adopted Netty + MQTT for the device access platform. It performed well initially, but as device count grew, the platform suffered an off‑heap Out‑Of‑Memory (OOM) condition that JVM garbage collection could not manage.

Problem Identification

An APM alert reported continuous latency requests and repeated OutOfDirectMemoryError exceptions. Logs showed the error repeatedly, while CPU and JVM heap usage appeared normal.

Using jmap and MAT, we examined a memory dump. Heap analysis showed no anomalies, suggesting the issue lay in off‑heap memory. Netty provides a metric

io.netty.util.internal.PlatformDependent.DIRECT_MEMORY_COUNTER

to track direct memory usage.

MAT revealed the counter value: DIRECT_MEMORY_COUNTER = 1996488991B = 1904MB Another metric confirmed the same usage. Netty allocates off‑heap memory in 16 MB chunks; with 119 PoolChunks the total reached 119 × 16 MB = 1904 MB, indicating a leak.

Solution

To monitor the direct memory counter, we employed BTrace, a low‑overhead Java tracing tool, and wrote the following script:

import com.sun.btrace.BTraceUtils;
import com.sun.btrace.annotations.BTrace;
import com.sun.btrace.annotations.Kind;
import com.sun.btrace.annotations.Location;
import com.sun.btrace.annotations.OnMethod;
import com.sun.btrace.annotations.OnTimer;
import com.sun.btrace.annotations.Self;
import java.util.concurrent.atomic.AtomicLong;
import static com.sun.btrace.BTraceUtils.Reflective.field;
import static com.sun.btrace.BTraceUtils.Reflective.get;
import static com.sun.btrace.BTraceUtils.Strings.str;
import static com.sun.btrace.BTraceUtils.Strings.strcat;

@BTrace(unsafe = true)
public class NettyDirectMemoryOOMScript {
    private static AtomicLong totalCount = new AtomicLong(0);

    @OnMethod(
        clazz = "io.netty.util.internal.PlatformDependent",
        method = "incrementMemoryCounter",
        location = @Location(Kind.RETURN)
    )
    public static void run(@Self Object self) {
        totalCount = (AtomicLong) get(
            field("io.netty.util.internal.PlatformDependent", "DIRECT_MEMORY_COUNTER"), self);
    }

    @OnTimer(1000)
    public static void print() {
        BTraceUtils.println(strcat("netty direct memory: ", str(totalCount.get())));
    }
}

Running the script while simulating device connections showed memory growing to 128 MB and then plateauing, because Netty uses a pooled PooledDirectByteBuf. Setting -XX:MaxDirectMemorySize=1000K disables pooling, but the counter stayed at zero, indicating our simulation missed the offending operation.

Further code inspection revealed that the handler convertToRpcCommandResponse received a decoded MqttPublishMessage whose associated ByteBuf was never released, causing the leak. Adding byteBuf.release() after processing resolved the issue.

Monitoring Method

We also created a Spring component to periodically log the direct memory metrics via reflection:

@Component
@Slf4j
public class DirectMemoryMonitor {
    private AtomicLong directMemory;

    @PostConstruct
    public void init() throws IllegalAccessException {
        Field field = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_COUNTER");
        Field field1 = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_LIMIT");
        field.setAccessible(true);
        field1.setAccessible(true);
        directMemory = (AtomicLong) field.get(PlatformDependent.class);
        log.info("netty direct memory limit:{}kb", field1.get(PlatformDependent.class));
        Executors.newSingleThreadScheduledExecutor()
            .scheduleAtFixedRate(this::report, 0, 1000, TimeUnit.MILLISECONDS);
    }

    private void report() {
        long directMemoryKb = directMemory.get();
        log.info("netty direct memory:{}", directMemoryKb);
    }
}

These metrics can be pushed to Elasticsearch and visualized in Grafana.

Summary

In Netty 4, off‑heap memory is automatically released in four scenarios:

Calling writeAndFlush;

Using a handler that extends SimpleChannelInboundHandler;

Reaching the TailHandler;

Extending ByteToMessageDecoder, which performs auto‑release.

When using ByteBuf directly, developers must manually invoke release(). Additionally, JVM heap settings should reserve space for off‑heap memory; for a 2 GB container, allocate no more than 1–1.2 GB to the heap to leave room for thread stacks and direct memory.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

NettyMemory LeakIoTOff-Heap MemoryMQTTBTrace
Seewo Tech Circle
Written by

Seewo Tech Circle

Seewo Tech Circle

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.