Design and Optimization of High‑Concurrency Push Services Using Netty
This article analyzes common questions about Netty‑based push services, presents a real‑world IoT case study, and provides detailed design guidelines—including kernel tuning, connection handling, heartbeat configuration, buffer management, memory pooling, logging pitfalls, TCP and JVM tuning—to help engineers build stable, scalable, and high‑performance million‑connection push systems.
1. Introduction
Many readers have asked about using Netty for push services, the maximum number of clients a Netty server can support, and various technical issues encountered during development. This article summarizes those questions and offers design recommendations to avoid common pitfalls.
1.2 Push Services
In the mobile Internet era, push services are essential for app engagement and retention. They also power notifications for IoT devices, which will soon involve massive numbers of long‑lived connections.
1.3 Characteristics of Mobile Push Services
Unstable wireless networks (e.g., subway signal loss) cause frequent disconnections.
Massive client count with long‑lived connections leads to high resource consumption on both client and server.
Android devices maintain multiple long connections, increasing traffic and power usage.
Message loss, duplication, delay, and expiration are common.
Spam messages and lack of unified governance.
Solutions such as JD Cloud's push service use a single‑connection model and AlarmManager‑based heartbeats to save bandwidth and power.
2. Real‑World IoT Case Study
2.1 Problem Description
An MQTT middleware for a smart‑home platform kept 100,000 users online with 20,000 concurrent message requests. After running for a while, memory leaks were observed, suspected to be caused by Netty.
Server: 16 GB RAM, 8‑core CPU.
Netty boss thread pool size = 1, worker pool size = 6 (later increased to 11).
Netty version 4.0.8.Final.
2.2 Diagnosis
Heap dump revealed a 9076 % increase in ScheduledFutureTask instances (≈1.1 M). The root cause was an IdleStateHandler with a 15‑minute idle timeout, creating a scheduled task per idle connection.
Each long‑lived connection generated a task that held references to business objects, preventing GC and causing apparent memory leaks. Reducing the idle timeout to 45 seconds allowed normal memory reclamation.
2.3 Summary
Even a modest number of long connections (e.g., 100) does not cause leaks; the issue appears only at the hundred‑thousand scale. Proper timeout settings and careful task handling are essential for scalable push services.
3. Netty Massive Push Service Design Points
3.1 Increase Maximum File Handles
Million‑scale connections require raising the Linux file‑descriptor limit (default 1024). Example:
[root@lilinfeng ~]# ulimit -a
... open files (-n) 1024 ...Modify /etc/security/limits.conf :
* soft nofile 1000000
* hard nofile 1000000Be aware that extremely high handle counts can degrade performance; clustering may be necessary.
3.2 Beware of CLOSE_WAIT
Mobile networks cause frequent client reconnects, leading to many sockets stuck in CLOSE_WAIT if the server does not close them promptly. Accumulated CLOSE_WAIT sockets consume file handles and memory, eventually causing “Too many open files”.
Typical causes:
Bug in Netty or business code that fails to close the socket after receiving FIN.
Blocked I/O threads delaying socket closure.
3.3 Reasonable Heartbeat Interval
Heartbeats must balance network stability and signaling load. For 2.5 G networks, a 180‑second interval is often suitable; WeChat uses 300 seconds.
Implementation example:
public void initChannel(Channel channel) {
channel.pipeline().addLast("idleStateHandler", new IdleStateHandler(0, 0, 180));
channel.pipeline().addLast("myHandler", new MyHandler());
}
public class MyHandler extends ChannelHandlerAdapter {
@Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
if (evt instanceof IdleStateEvent) {
// handle heartbeat
}
}
}3.4 Buffer Size Configuration
ByteBuffer has a fixed capacity, which can waste memory when handling many connections. Netty’s ByteBuf supports dynamic resizing. Two allocators are available:
FixedRecvByteBufAllocator : fixed size, can expand if needed.
AdaptiveRecvByteBufAllocator : adjusts size based on recent traffic.
Example of setting the adaptive allocator:
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.RCVBUF_ALLOCATOR, AdaptiveRecvByteBufAllocator.DEFAULT);3.5 Memory Pool
Using a pooled allocator (e.g., PooledByteBufAllocator ) reduces GC pressure by reusing buffers. Netty’s pool works like a Java version of jemalloc.
Enable it:
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);Remember to release buffers with ReferenceCountUtil.release(msg) to avoid leaks.
3.6 Logging Pitfalls
Blocking I/O threads with synchronous logging (e.g., Log4j when the log queue is full) can stall the event loop, leading to socket accumulation. Asynchronous appenders help, but queue saturation still blocks.
synchronized (this.buffer) {
while (true) {
int previousSize = this.buffer.size();
if (previousSize < this.bufferSize) {
this.buffer.add(event);
if (previousSize != 0) break;
this.buffer.notifyAll(); break;
}
boolean discard = true;
if ((this.blocking) && (!Thread.interrupted()) && (Thread.currentThread() != this.dispatcher)) {
try {
this.buffer.wait(); // block business thread
discard = false;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
}3.7 TCP Parameter Tuning
Adjust SO_SNDBUF and SO_RCVBUF (commonly 32 KB) to match message size. Enable Receive Packet Steering (RPS) on Linux ≥ 2.6.35 to distribute soft interrupts across CPUs, improving throughput by >20 %.
3.8 JVM Settings
Key JVM parameters:
-Xmx set according to the memory model.
GC tuning (young/old generation ratios, collector choice) to minimize Full GC frequency.
Overall, careful kernel, Netty, and JVM tuning—combined with proper timeout, buffer, and logging strategies—allows a Netty server to reliably handle millions of concurrent push connections.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.