Investigation and Resolution of Netty ChannelOutboundBuffer Memory Leak in a Mobile Push System
In a high‑throughput mobile push service built on Netty, an off‑heap memory leak was traced to unchecked ChannelOutboundBuffer growth when half‑closed connections remained active but unwritable; fixing it by disabling autoRead on unwritable channels, configuring write‑buffer watermarks, and adding explicit isWritable() checks eliminated the crash and stabilized the system.
Business background : Mobile applications rely heavily on push notifications to achieve operational goals such as marketing campaigns or new‑feature alerts. The push system must deliver messages within seconds, support millions of pushes per second, and handle millions of long‑lived connections.
Problem background : In production, the long‑connection (Broker) nodes built with Netty occasionally experience a process crash (about once a month), causing delayed message delivery. The crash occurs after a period of stable operation and is not detected by the existing Netty memory‑leak monitoring parameters.
Netty overview : Netty is a high‑performance, asynchronous, event‑driven NIO framework widely used in internet, big‑data, gaming, and communication domains. It underpins many open‑source projects such as HBase, Hadoop, Dubbo, etc.
Problem analysis :
4.1 Hypothesis : Initially suspected that the sheer number of connections (≈390,000) caused the issue, but each channel occupies only 1,456 bytes, which does not explain the memory pressure.
4.2 GC log inspection : Frequent full GC (every 5 minutes) before the crash, yet heap size does not decrease, suggesting off‑heap memory leakage.
4.3 Heap analysis : The ChannelOutboundBuffer object consumes nearly 5 GB. The root cause is an excessive number of entries in the buffer, which are not flushed to the socket.
4.4 Why data is not written : The code checks Channel.isActive() but does not verify writability. When a connection becomes half‑closed (client abruptly disconnects), the channel remains active but is not writable, leading to data accumulation.
Reproduction steps :
1) Simulate a client cluster and establish long‑connection sessions; block network traffic to cause Channel.isActive() to return true while data cannot be sent. 2) Reduce off‑heap memory and continuously send ~1 KB messages. 3) With a 128 MB memory limit, the issue appears after roughly 90,000 write attempts.
Solution :
5.1 Enable autoRead control – turn off autoRead when the channel is not writable and turn it back on when it becomes writable.
public void channelReadComplete(ChannelHandlerContext ctx) throws Exception {
if (!ctx.channel().isWritable()) {
Channel channel = ctx.channel();
ChannelInfo channelInfo = ChannelManager.CHANNEL_CHANNELINFO.get(channel);
String clientId = "";
if (channelInfo != null) {
clientId = channelInfo.getClientId();
}
LOGGER.info("channel is unwritable, turn off autoread, clientId:{}", clientId);
channel.config().setAutoRead(false);
}
}When the channel becomes writable again:
@Override
public void channelWritabilityChanged(ChannelHandlerContext ctx) throws Exception {
Channel channel = ctx.channel();
ChannelInfo channelInfo = ChannelManager.CHANNEL_CHANNELINFO.get(channel);
String clientId = "";
if (channelInfo != null) {
clientId = channelInfo.getClientId();
}
if (channel.isWritable()) {
LOGGER.info("channel is writable again, turn on autoread, clientId:{}", clientId);
channel.config().setAutoRead(true);
}
}5.2 Configure high/low watermarks to bound the ChannelOutboundBuffer size.
serverBootstrap.option(ChannelOption.WRITE_BUFFER_WATER_MARK,
new WriteBufferWaterMark(1024 * 1024, 8 * 1024 * 1024));5.3 Add explicit writability check before sending messages.
private void writeBackMessage(ChannelHandlerContext ctx, MqttMessage message) {
Channel channel = ctx.channel();
// Add channel.isWritable() check
if (channel.isActive() && channel.isWritable()) {
ChannelFuture cf = channel.writeAndFlush(message);
if (cf.isDone() && cf.cause() != null) {
LOGGER.error("channelWrite error!", cf.cause());
ctx.close();
}
}
}5.4 Verification : After applying the above changes, the system was tested up to 270,000 write operations without errors.
Netty source explanation :
When the ChannelOutboundBuffer exceeds the high‑water‑mark, isWritable() returns false , the channel is marked unwritable, and fireChannelWritabilityChanged() is triggered. The relevant code:
private void incrementPendingOutboundBytes(long size, boolean invokeLater) {
if (size == 0) { return; }
long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, size);
if (newWriteBufferSize > channel.config().getWriteBufferHighWaterMark()) {
setUnwritable(invokeLater);
}
}
private void setUnwritable(boolean invokeLater) {
for (;;) {
final int oldValue = unwritable;
final int newValue = oldValue | 1;
if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) {
if (oldValue == 0 && newValue != 0) {
fireChannelWritabilityChanged(invokeLater);
}
break;
}
}
}Conversely, when the buffer drops below the low‑water‑mark, isWritable() returns true and the channel becomes writable again:
private void decrementPendingOutboundBytes(long size, boolean invokeLater, boolean notifyWritability) {
if (size == 0) { return; }
long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, -size);
if (notifyWritability && newWriteBufferSize < channel.config().getWriteBufferLowWaterMark()) {
setWritable(invokeLater);
}
}
private void setWritable(boolean invokeLater) {
for (;;) {
final int oldValue = unwritable;
final int newValue = oldValue & ~1;
if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) {
if (oldValue != 0 && newValue == 0) {
fireChannelWritabilityChanged(invokeLater);
}
break;
}
}
}Conclusion : By controlling autoRead, setting appropriate watermarks, and checking isWritable() before writes, the system avoids uncontrolled growth of the ChannelOutboundBuffer . After deployment, the push service has run for six months without recurrence of the memory‑leak crash.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.