Backend Development 14 min read

Optimizing Long‑Connection Services with Netty: From Millions of Connections to High QPS

This article summarizes the challenges and optimization techniques for building a high‑performance long‑connection service with Netty, covering non‑blocking I/O, Linux kernel tuning, client‑side testing, VM‑based scaling, data‑structure tweaks, CPU and GC bottlenecks, and the final results of achieving hundreds of thousands of connections and tens of thousands of QPS on a single server.

Architect's Guide
Architect's Guide
Architect's Guide
Optimizing Long‑Connection Services with Netty: From Millions of Connections to High QPS

About a year and a half ago the author needed an Android push service and discovered that, unlike iOS, Android lacks a unified push platform; most solutions rely on long‑living TCP connections. After using JPush for a small product, the author was later tasked with optimizing the company's own long‑connection server.

The article collects the difficulties and optimization points encountered when implementing a long‑connection service with Netty.

What Is Netty

Netty (http://netty.io/) is an asynchronous event‑driven network application framework for rapid development of maintainable high‑performance protocol servers and clients.

Key advantages include high performance, zero‑copy, native Linux socket support, compatibility with Java 1.7 NIO2 and earlier NIO, and pooled buffers.

Bottlenecks

The ultimate goals of a long‑connection service are to support more simultaneous connections and higher QPS. The article examines where the real bottlenecks lie.

More Connections

Both Java NIO and Netty are non‑blocking, so achieving millions of connections is not difficult from a coding perspective. Sample Java NIO code for a simple selector‑based server is shown below:

ServerSocketChannel ssc = ServerSocketChannel.open();
Selector sel = Selector.open();
ssc.configureBlocking(false);
ssc.socket().bind(new InetSocketAddress(8080));
SelectionKey key = ssc.register(sel, SelectionKey.OP_ACCEPT);
while (true) {
    sel.select();
    Iterator it = sel.selectedKeys().iterator();
    while (it.hasNext()) {
        SelectionKey skey = (SelectionKey) it.next();
        it.remove();
        if (skey.isAcceptable()) {
            ch = ssc.accept();
        }
    }
}

Netty code to achieve the same is equally straightforward:

NioEventLoopGroup bossGroup = new NioEventLoopGroup();
NioEventLoopGroup workerGroup = new NioEventLoopGroup();
ServerBootstrap bootstrap = new ServerBootstrap();
bootstrap.group(bossGroup, workerGroup);
bootstrap.channel(NioServerSocketChannel.class);
bootstrap.childHandler(new ChannelInitializer
() {
    @Override
    protected void initChannel(SocketChannel ch) throws Exception {
        ChannelPipeline pipeline = ch.pipeline();
        // add handlers here
    }
});
bootstrap.bind(8080).sync();

The real limitation is not in the code but in the Linux kernel configuration: the default maximum number of open files and process limits must be increased to allow hundreds of thousands of sockets.

Verification

To verify the server’s capacity, a Netty client that also uses non‑blocking I/O is written. With root privileges a single machine can open about 60 000 connections; the client loops to create as many connections as possible:

NioEventLoopGroup workerGroup = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(workerGroup);
b.channel(NioSocketChannel.class);
b.handler(new ChannelInitializer
() {
    @Override
    public void initChannel(SocketChannel ch) throws Exception {
        ChannelPipeline pipeline = ch.pipeline();
        // add handlers here
    }
});
for (int k = 0; k < 60000; k++) {
    // replace with server IP
    b.connect(127.0.0.1, 8080);
}

Running this client on a single host yields ~60 000 connections; to reach a million connections the author used multiple physical machines, virtual machines with bridged network adapters, and MAC address changes to reuse IP/port tuples.

Higher QPS

Because Netty and NIO are non‑blocking, QPS does not degrade with more connections as long as memory is sufficient. The true QPS bottleneck is often the data‑structure design. The author stresses not to premature‑optimize; first ensure consistency, then profile to find the hottest path.

CPU bottlenecks are identified with VisualVM (sample mode) by sorting methods by self‑time. An example hotspot was ConcurrentLinkedQueue.size() , which traverses the entire queue each call. Replacing it with an AtomicInteger counter eliminated the issue.

GC Bottlenecks

GC pauses also affect CPU performance. Using VisualVM with the VisualGC plugin, the author observed excessive Old Generation collections. Adjusting the JVM flag -XX:NewRatio reduced Old GC frequency. In production, a larger old generation is recommended because connections live longer.

Other Optimizations

The author recommends reading "Netty in Action" and the Netty Best Practices site for additional tweaks that further increased QPS.

Running on a 16‑core, 120 GB RAM machine (JVM limited to 8 GB) with Java 1.6 achieved 600 000 concurrent connections and 200 000 QPS. The author notes that the JVM was not fully tuned and that further gains are possible.

Overall, the article provides a practical guide to building, tuning, and validating a high‑performance long‑connection service using Netty.

performance optimizationNettyhigh QPSGC TuningJava NIOLong ConnectionsLinux Tuning
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.