Backend Development 24 min read

Design and Optimization of High‑Concurrency Push Services Using Netty

This article analyzes common questions about Netty‑based push services, presents a real‑world IoT case study, and provides detailed design guidelines—including kernel tuning, connection handling, heartbeat configuration, buffer management, memory pooling, logging pitfalls, TCP and JVM tuning—to help engineers build stable, scalable, and high‑performance million‑connection push systems.

Top Architect

Oct 21, 2020

Design and Optimization of High‑Concurrency Push Services Using Netty

1. Introduction

Many readers have asked about using Netty for push services, the maximum number of clients a Netty server can support, and various technical issues encountered during development. This article summarizes those questions and offers design recommendations to avoid common pitfalls.

1.2 Push Services

In the mobile Internet era, push services are essential for app engagement and retention. They also power notifications for IoT devices, which will soon involve massive numbers of long‑lived connections.

1.3 Characteristics of Mobile Push Services

Unstable wireless networks (e.g., subway signal loss) cause frequent disconnections.

Massive client count with long‑lived connections leads to high resource consumption on both client and server.

Android devices maintain multiple long connections, increasing traffic and power usage.

Message loss, duplication, delay, and expiration are common.

Spam messages and lack of unified governance.

Solutions such as JD Cloud's push service use a single‑connection model and AlarmManager‑based heartbeats to save bandwidth and power.

2. Real‑World IoT Case Study

2.1 Problem Description

An MQTT middleware for a smart‑home platform kept 100,000 users online with 20,000 concurrent message requests. After running for a while, memory leaks were observed, suspected to be caused by Netty.

Server: 16 GB RAM, 8‑core CPU.

Netty boss thread pool size = 1, worker pool size = 6 (later increased to 11).

Netty version 4.0.8.Final.

2.2 Diagnosis

Heap dump revealed a 9076 % increase in ScheduledFutureTask instances (≈1.1 M). The root cause was an IdleStateHandler with a 15‑minute idle timeout, creating a scheduled task per idle connection.

Each long‑lived connection generated a task that held references to business objects, preventing GC and causing apparent memory leaks. Reducing the idle timeout to 45 seconds allowed normal memory reclamation.

2.3 Summary

Even a modest number of long connections (e.g., 100) does not cause leaks; the issue appears only at the hundred‑thousand scale. Proper timeout settings and careful task handling are essential for scalable push services.

3. Netty Massive Push Service Design Points

3.1 Increase Maximum File Handles

Million‑scale connections require raising the Linux file‑descriptor limit (default 1024). Example:

[root@lilinfeng ~]# ulimit -a
... open files (-n) 1024 ...

Modify /etc/security/limits.conf:

* soft nofile 1000000
* hard nofile 1000000

Be aware that extremely high handle counts can degrade performance; clustering may be necessary.

3.2 Beware of CLOSE_WAIT

Mobile networks cause frequent client reconnects, leading to many sockets stuck in CLOSE_WAIT if the server does not close them promptly. Accumulated CLOSE_WAIT sockets consume file handles and memory, eventually causing “Too many open files”.

Typical causes:

Bug in Netty or business code that fails to close the socket after receiving FIN.

Blocked I/O threads delaying socket closure.

3.3 Reasonable Heartbeat Interval

Heartbeats must balance network stability and signaling load. For 2.5 G networks, a 180‑second interval is often suitable; WeChat uses 300 seconds.

Implementation example:

public void initChannel(Channel channel) {
    channel.pipeline().addLast("idleStateHandler", new IdleStateHandler(0, 0, 180));
    channel.pipeline().addLast("myHandler", new MyHandler());
}

public class MyHandler extends ChannelHandlerAdapter {
    @Override
    public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
        if (evt instanceof IdleStateEvent) {
            // handle heartbeat
        }
    }
}

3.4 Buffer Size Configuration

ByteBuffer has a fixed capacity, which can waste memory when handling many connections. Netty’s ByteBuf supports dynamic resizing. Two allocators are available: FixedRecvByteBufAllocator: fixed size, can expand if needed. AdaptiveRecvByteBufAllocator: adjusts size based on recent traffic.

Example of setting the adaptive allocator:

Bootstrap b = new Bootstrap();
    b.group(group)
     .channel(NioSocketChannel.class)
     .option(ChannelOption.TCP_NODELAY, true)
     .option(ChannelOption.RCVBUF_ALLOCATOR, AdaptiveRecvByteBufAllocator.DEFAULT);

3.5 Memory Pool

Using a pooled allocator (e.g., PooledByteBufAllocator) reduces GC pressure by reusing buffers. Netty’s pool works like a Java version of jemalloc.

Enable it:

Bootstrap b = new Bootstrap();
    b.group(group)
     .channel(NioSocketChannel.class)
     .option(ChannelOption.TCP_NODELAY, true)
     .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);

Remember to release buffers with ReferenceCountUtil.release(msg) to avoid leaks.

3.6 Logging Pitfalls

Blocking I/O threads with synchronous logging (e.g., Log4j when the log queue is full) can stall the event loop, leading to socket accumulation. Asynchronous appenders help, but queue saturation still blocks.

synchronized (this.buffer) {
    while (true) {
        int previousSize = this.buffer.size();
        if (previousSize < this.bufferSize) {
            this.buffer.add(event);
            if (previousSize != 0) break;
            this.buffer.notifyAll(); break;
        }
        boolean discard = true;
        if ((this.blocking) && (!Thread.interrupted()) && (Thread.currentThread() != this.dispatcher)) {
            try {
                this.buffer.wait(); // block business thread
                discard = false;
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

3.7 TCP Parameter Tuning

Adjust SO_SNDBUF and SO_RCVBUF (commonly 32 KB) to match message size. Enable Receive Packet Steering (RPS) on Linux ≥ 2.6.35 to distribute soft interrupts across CPUs, improving throughput by >20 %.

3.8 JVM Settings

Key JVM parameters: -Xmx set according to the memory model.

GC tuning (young/old generation ratios, collector choice) to minimize Full GC frequency.

Overall, careful kernel, Netty, and JVM tuning—combined with proper timeout, buffer, and logging strategies—allows a Netty server to reliably handle millions of concurrent push connections.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Netty memory Push Service

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.