How Dubbo’s Single Long Connection Leads to Congestion and What You Can Do
This article examines a production incident where Dubbo services became congested, explains the underlying communication process with detailed code analysis, identifies key protocol and service parameters affecting performance, conducts experiments on connection count and TCP buffers, and offers practical recommendations to optimize Dubbo throughput.
Background
During a production release a sudden traffic surge caused congestion. The deployment consists of HTTP clients accessing Dubbo consumers, which in turn call providers via Dubbo protocol. The scenario involved one data center with 8 consumers and 3 providers, two data centers serving externally.
When one data center was taken offline and later updated, the remaining single data center could not handle the high concurrent traffic, leading to congestion. Observed symptoms included low CPU usage on providers, thread pool not full during congestion, and irregular request rates.
Root Cause Guess
Metrics showed no anomalies, and no dumps were taken. Analysis suggested network congestion due to insufficient Dubbo connections; by default each consumer uses a single long connection, which did not fully utilize network resources.
Dubbo’s default protocol uses a single long connection and NIO asynchronous communication, suitable for small data, high concurrency, and when consumer machines far outnumber provider machines. Conversely, it is not suitable for large data transfers such as files or videos unless request volume is very low.
Because our consumer and provider counts were low, the connection count likely limited throughput. The following sections detail the Dubbo communication flow and key parameters.
Dubbo Communication Flow Details
We use Dubbo 2.5.x with Netty 3.2.5. The call process includes:
1. Request Enqueue
The request is placed into Netty’s writeTaskQueue, a LinkedTransferQueue.
class NioWorker implements Runnable {
...
private final Queue<Runnable> writeTaskQueue = new LinkedTransferQueue<Runnable>();
...
}2. Caller Thread Waits
The caller receives a DefaultFuture and blocks on DefaultFuture.get(timeout).
public class DubboInvoker<T> extends AbstractInvoker<T> {
...
return (Result) currentClient.request(inv, timeout).get(); // returns DefaultFuture
}DefaultFuture stores the request ID and maps it to a channel.
public class DefaultFuture implements ResponseFuture {
private static final Map<Long, Channel> CHANNELS = new ConcurrentHashMap<Long, Channel>();
private static final Map<Long, DefaultFuture> FUTURES = new ConcurrentHashMap<Long, DefaultFuture>();
private final long id; // request ID
...
public DefaultFuture(Channel channel, Request request, int timeout) {
this.channel = channel;
this.request = request;
this.id = request.getId();
FUTURES.put(id, this);
CHANNELS.put(id, channel);
}
}3. IO Thread Writes to Socket Buffer
The Netty IO thread processes the write queue. If the kernel buffer is full after 16 attempts, writeSuspended is set true.
void writeFromTaskLoop(final NioSocketChannel ch) {
if (!ch.writeSuspended) {
write0(ch);
}
}
private void write0(NioSocketChannel channel) {
int writeSpinCount = channel.getConfig().getWriteSpinCount(); // default 16
for (int i = writeSpinCount; i > 0; i--) {
localWrittenBytes = buf.transferTo(ch);
if (localWrittenBytes != 0) {
writtenBytes += localWrittenBytes;
break;
}
if (buf.finished()) {
break;
}
}
if (!buf.finished()) {
writeSuspended = true;
// register OP_WRITE
}
}4. Data Sent Over Network
The OS transmits data from the socket’s send buffer to the peer’s receive buffer. TCP retransmission ensures reliability.
5. Server IO Thread Reads Request
The server’s NIO thread reads data from the socket and fires a message-received event.
public void received(Channel channel, Object message) throws RemotingException {
if (message instanceof Request) {
// handle request
if (request.isTwoWay()) {
Response response = handleRequest(channel, request);
channel.send(response);
}
}
}6. Business Thread Processes Request
Depending on the dispatcher configuration (default “all”), the request is handed to a business thread pool. If the pool is exhausted and has no queue, a “Threadpool is exhausted” error is returned.
7. Response Sent Back
The response is placed into the write queue, goes through the same IO steps, and finally reaches the client.
8. Client Future Receives Response
DefaultFuture matches the response ID, notifies the waiting caller thread, and the RPC call returns.
Key Parameters Affecting the Flow
Protocol Parameters
Typical Dubbo protocol configuration (example):
<dubbo:protocol name="dubbo" port="20880" dispatcher="all" threadpool="fixed" threads="2000" />Important attributes include threadpool , threads , queues , iothreads , accepts , dispatcher , payload , buffer , heartbeat , etc. Their defaults and performance impact are listed in the original table.
Service Parameters
Key service-level settings such as timeout , retries , connections , loadbalance , async , and cluster also influence latency and throughput.
In our case the main bottleneck was the connections parameter being too low, limiting the number of simultaneous TCP connections per provider.
Experiments: Connection Count and Socket Buffer
Experiment 1 – Single Connection, Vary TCP Buffer
We adjusted Linux net.ipv4.tcp_rmem and tcp_wmem to increase socket buffers. Results showed response time decreased as buffer size grew, but the Send‑Q remained around 64 KB, indicating the buffer was still a limiting factor.
Experiment 2 – Multiple Connections, Fixed Buffer
Increasing the connections setting while keeping the buffer at 4 MB reduced latency dramatically. Four connections were enough to saturate the CPU and achieve the lowest response time.
Conclusion
To fully utilize network bandwidth, socket buffers must be sufficiently large; otherwise large messages overflow the buffer and degrade throughput. However, oversized buffers alone cannot compensate for a low connection count. The number of connections should exceed the number of CPU cores to keep the IO threads busy and achieve optimal performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
