Backend Development 15 min read

Implementing and Optimizing a High‑Concurrency Long‑Connection Service with Netty

This article explains how to build a scalable long‑connection server using Netty, discusses the underlying bottlenecks such as Linux kernel limits, CPU and GC issues, and provides practical code examples and tuning techniques to achieve hundreds of thousands of connections and high QPS.

Architecture Digest
Architecture Digest
Architecture Digest
Implementing and Optimizing a High‑Concurrency Long‑Connection Service with Netty

Push Service Background

About a year and a half ago we needed an Android push service; unlike iOS there is no unified push platform in China, so we relied on polling before adopting JPush's long‑connection solution, which handled 500k‑1M concurrent connections.

Two years later we were tasked with optimizing our own long‑connection server.

What Is Netty

Netty (http://netty.io/) is an asynchronous event‑driven network framework that promises high performance, zero‑copy, native Linux sockets, and compatibility with Java NIO2 and older NIO APIs.

High‑performance, highly scalable architecture.

Zero‑Copy to minimize memory copying.

Native Linux socket support.

Works with Java 1.7 NIO2 and earlier NIO.

Pooled Buffers reduce pressure on buffer allocation and release.

Bottlenecks

The two main goals of a long‑connection service are more connections and higher QPS. The real bottlenecks are not in Netty code but in OS configuration (max open files, process limits) and later in CPU, data structures, and GC.

More Connections

Both Java NIO and Netty can handle millions of connections because they use non‑blocking I/O and do not create a thread per connection.

Java NIO Example

ServerSocketChannel ssc = ServerSocketChannel.open();
Selector sel = Selector.open();
ssc.configureBlocking(false);
ssc.socket().bind(new InetSocketAddress(8080));
SelectionKey key = ssc.register(sel, SelectionKey.OP_ACCEPT);
while (true) {
    sel.select();
    Iterator it = sel.selectedKeys().iterator();
    while (it.hasNext()) {
        SelectionKey skey = (SelectionKey) it.next();
        it.remove();
        if (skey.isAcceptable()) {
            ch = ssc.accept();
        }
    }
}

This code only accepts connections and does nothing else, illustrating the basic NIO pattern.

Netty Example

NioEventLoopGroup bossGroup = new NioEventLoopGroup();
NioEventLoopGroup workerGroup = new NioEventLoopGroup();
ServerBootstrap bootstrap = new ServerBootstrap();
bootstrap.group(bossGroup, workerGroup);
bootstrap.channel(NioServerSocketChannel.class);
bootstrap.childHandler(new ChannelInitializer
() {
    @Override
    protected void initChannel(SocketChannel ch) throws Exception {
        ChannelPipeline pipeline = ch.pipeline();
        // todo: add handlers
    }
});
bootstrap.bind(8080).sync();

Again, the Netty bootstrap is straightforward and does not require special tricks to reach a million connections.

Where the Real Bottleneck Lies

With non‑blocking I/O the bottleneck moves to the Linux kernel configuration – the default limits on maximum open files and process resources must be increased.

How to Verify Capacity

We built a Netty client that opens up to 60,000 connections (limited by root privileges) and repeatedly connects to the server:

NioEventLoopGroup workerGroup = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
 b.group(workerGroup);
 b.channel(NioSocketChannel.class);
 b.handler(new ChannelInitializer
() {
     @Override
     public void initChannel(SocketChannel ch) throws Exception {
         ChannelPipeline pipeline = ch.pipeline();
         // todo: add handler
     }
 });
 for (int k = 0; k < 60000; k++) {
     b.connect(127.0.0.1, 8080);
 }

Running this client on a machine with tuned kernel parameters validates the server’s ability to hold many connections.

Finding More Machines

Since a single host can hold ~60k connections, we need multiple hosts. Using virtual machines with bridged networking and multiple VMs per physical server allowed us to reach the million‑connection target with only four physical machines.

Trick to Inflate Connection Count

By disabling keep‑alive on the server, repeatedly forcing a VM to crash, changing its MAC address, and reconnecting, the server perceives new connections while keeping old ones alive, effectively inflating the connection count.

Higher QPS

Because Netty and NIO are non‑blocking, QPS does not degrade with more connections as long as memory is sufficient. The real QPS bottleneck is often the data‑structure design.

Data‑Structure Optimization

Complex projects require careful selection and combination of collections. For example, frequent calls to ConcurrentLinkedQueue.size() caused a CPU hotspot because the method traverses the whole list each time. Replacing it with an AtomicInteger counter eliminated the issue while preserving eventual consistency.

CPU Bottleneck Diagnosis

Use VisualVM (Sample mode) to identify methods with the highest self‑time. In our case, ConcurrentLinkedQueue.size() was the top offender.

GC Bottleneck

Excessive Old‑generation GC was observed due to the default 1:2 NewRatio. Adjusting -XX:NewRatio reduced Old GC frequency. In production, where many long‑lived connection objects exist, allocating a larger old generation is advisable.

Other Optimizations

Refer to "Netty Best Practices" and the book "Netty in Action" for additional tweaks that boosted our overall QPS.

Running on a 16‑core, 120 GB RAM machine with only 8 GB JVM heap, Java 1.6 achieved 600 k connections and 200 k QPS; further gains are possible with more memory and Java 1.7+.

Final Outcome

After weeks of stress testing and tuning, we reached 600 k concurrent connections and 200 k QPS on a single server, with low system load, indicating that the remaining bottleneck lies in I/O rather than CPU or memory.

performance optimizationNettyhigh concurrencyJava NIOLong Connectionsserver tuning
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.