How to Diagnose and Fix Netty Direct Memory Leaks in an IoT MQTT Broker
This article walks through the root cause analysis of an OutOfDirectMemoryError in a Netty‑based MQTT IoT broker, detailing how to detect off‑heap memory leaks, monitor them with BTrace and custom metrics, and apply proper ByteBuf release and configuration fixes.
Background
IoT (Internet of Things) is a key component of modern information technology, requiring massive device connections and bidirectional messaging. Implementing an IoT access platform with Netty + MQTT provides a high‑performance, reliable solution.
Netty is an asynchronous, event‑driven network framework built on JDK NIO, enabling fast development of high‑throughput servers and clients.
MQTT (Message Queuing Telemetry Transport) is a lightweight publish/subscribe protocol that runs over TCP/IP, offering real‑time, low‑bandwidth messaging for remote devices.
Our company adopted Netty + MQTT for the device access platform. It performed well initially, but as device count grew, the platform suffered an off‑heap Out‑Of‑Memory (OOM) condition that JVM garbage collection could not manage.
Problem Identification
An APM alert reported continuous latency requests and repeated OutOfDirectMemoryError exceptions. Logs showed the error repeatedly, while CPU and JVM heap usage appeared normal.
Using jmap and MAT, we examined a memory dump. Heap analysis showed no anomalies, suggesting the issue lay in off‑heap memory. Netty provides a metric
io.netty.util.internal.PlatformDependent.DIRECT_MEMORY_COUNTERto track direct memory usage.
MAT revealed the counter value: DIRECT_MEMORY_COUNTER = 1996488991B = 1904MB Another metric confirmed the same usage. Netty allocates off‑heap memory in 16 MB chunks; with 119 PoolChunks the total reached 119 × 16 MB = 1904 MB, indicating a leak.
Solution
To monitor the direct memory counter, we employed BTrace, a low‑overhead Java tracing tool, and wrote the following script:
import com.sun.btrace.BTraceUtils;
import com.sun.btrace.annotations.BTrace;
import com.sun.btrace.annotations.Kind;
import com.sun.btrace.annotations.Location;
import com.sun.btrace.annotations.OnMethod;
import com.sun.btrace.annotations.OnTimer;
import com.sun.btrace.annotations.Self;
import java.util.concurrent.atomic.AtomicLong;
import static com.sun.btrace.BTraceUtils.Reflective.field;
import static com.sun.btrace.BTraceUtils.Reflective.get;
import static com.sun.btrace.BTraceUtils.Strings.str;
import static com.sun.btrace.BTraceUtils.Strings.strcat;
@BTrace(unsafe = true)
public class NettyDirectMemoryOOMScript {
private static AtomicLong totalCount = new AtomicLong(0);
@OnMethod(
clazz = "io.netty.util.internal.PlatformDependent",
method = "incrementMemoryCounter",
location = @Location(Kind.RETURN)
)
public static void run(@Self Object self) {
totalCount = (AtomicLong) get(
field("io.netty.util.internal.PlatformDependent", "DIRECT_MEMORY_COUNTER"), self);
}
@OnTimer(1000)
public static void print() {
BTraceUtils.println(strcat("netty direct memory: ", str(totalCount.get())));
}
}Running the script while simulating device connections showed memory growing to 128 MB and then plateauing, because Netty uses a pooled PooledDirectByteBuf. Setting -XX:MaxDirectMemorySize=1000K disables pooling, but the counter stayed at zero, indicating our simulation missed the offending operation.
Further code inspection revealed that the handler convertToRpcCommandResponse received a decoded MqttPublishMessage whose associated ByteBuf was never released, causing the leak. Adding byteBuf.release() after processing resolved the issue.
Monitoring Method
We also created a Spring component to periodically log the direct memory metrics via reflection:
@Component
@Slf4j
public class DirectMemoryMonitor {
private AtomicLong directMemory;
@PostConstruct
public void init() throws IllegalAccessException {
Field field = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_COUNTER");
Field field1 = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_LIMIT");
field.setAccessible(true);
field1.setAccessible(true);
directMemory = (AtomicLong) field.get(PlatformDependent.class);
log.info("netty direct memory limit:{}kb", field1.get(PlatformDependent.class));
Executors.newSingleThreadScheduledExecutor()
.scheduleAtFixedRate(this::report, 0, 1000, TimeUnit.MILLISECONDS);
}
private void report() {
long directMemoryKb = directMemory.get();
log.info("netty direct memory:{}", directMemoryKb);
}
}These metrics can be pushed to Elasticsearch and visualized in Grafana.
Summary
In Netty 4, off‑heap memory is automatically released in four scenarios:
Calling writeAndFlush;
Using a handler that extends SimpleChannelInboundHandler;
Reaching the TailHandler;
Extending ByteToMessageDecoder, which performs auto‑release.
When using ByteBuf directly, developers must manually invoke release(). Additionally, JVM heap settings should reserve space for off‑heap memory; for a 2 GB container, allocate no more than 1–1.2 GB to the heap to leave room for thread stacks and direct memory.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
