Uncovering the High‑Performance Secrets of Dubbo3 Triple Protocol
This article dives deep into Dubbo3's Triple protocol, explaining its design, identifying performance bottlenecks with tools like VisualVM and JFR, and presenting concrete code‑level optimizations—including async stream creation, lock‑contention fixes, thread‑pool tuning, and batch writes—that boost throughput by up to 45% in real‑world Alibaba workloads.
Dubbo3 Triple protocol is a hybrid of gRPC, gRPC‑Web and Dubbo2, offering full gRPC compatibility, streaming support, and seamless HTTP/1 and browser access, allowing Dubbo, gRPC, curl or browser clients to invoke services without extra configuration.
Since 2021 Dubbo3 has replaced the internal HSF framework at Alibaba, handling trillion‑level service calls during Double‑11, making Triple's performance critical for overall system efficiency.
Pre‑knowledge
Triple combines features of gRPC and gRPC‑Web, supporting HTTP/1 and HTTP/2. Its core components include:
TripleInvoker : Handles UNARY, BiStream, etc., with doInvoke dispatching calls.
TripleClientStream : Maps to HTTP/2 streams, providing sendHeader and sendMessage.
WriteQueue : Buffers commands and submits them to Netty's EventLoop for ordered execution.
QueueCommand : Abstract task executed by the WriteQueue.
TripleServerStream : Server‑side counterpart for handling incoming streams.
Tooling for Performance Diagnosis
Two main tools are used:
VisualVM : Monitors CPU, threads, memory, and provides sampling to locate hot methods.
Java Flight Recorder (JFR) : Low‑overhead event recorder that captures monitor blocking, thread parking, and other runtime events.
Optimization Ideas
Eliminate blocking calls (e.g., Thread.sleep, await).
Adopt asynchronous programming (e.g., CompletableFuture).
Apply divide‑and‑conquer to split large tasks.
Batch operations to reduce I/O frequency.
Identifying a Major Blocking Point
Analysis with VisualVM revealed that syncUninterruptibly in WriteQueue.createWriteQueue blocks the user thread while waiting for an Http2StreamChannel to be created:
private WriteQueue createWriteQueue(Channel parent) {
Http2StreamChannelBootstrap bootstrap = new Http2StreamChannelBootstrap(parent);
Future<Http2StreamChannel> future = bootstrap.open().syncUninterruptibly();
if (!future.isSuccess()) {
throw new IllegalStateException("Create remote stream failed. channel:" + parent);
}
Http2StreamChannel channel = future.getNow();
channel.pipeline()
.addLast(new TripleCommandOutBoundHandler())
.addLast(new TripleHttp2ClientResponseHandler(createTransportListener()));
channel.closeFuture().addListener(f -> transportException(f.cause()));
return new WriteQueue(channel);
}The blocking occurs because the user thread submits the task to the EventLoop and then waits synchronously, causing unnecessary latency.
Async Stream Creation Fix
By converting the creation into an asynchronous command and enqueuing it, the blocking call is removed:
private TripleStreamChannelFuture initHttp2StreamChannel(Channel parent) {
TripleStreamChannelFuture streamChannelFuture = new TripleStreamChannelFuture(parent);
Http2StreamChannelBootstrap bootstrap = new Http2StreamChannelBootstrap(parent);
bootstrap.handler(new ChannelInboundHandlerAdapter() {
@Override
public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
Channel channel = ctx.channel();
channel.pipeline().addLast(new TripleCommandOutBoundHandler());
channel.pipeline().addLast(new TripleHttp2ClientResponseHandler(createTransportListener()));
channel.closeFuture().addListener(f -> transportException(f.cause()));
}
});
CreateStreamQueueCommand cmd = CreateStreamQueueCommand.create(bootstrap, streamChannelFuture);
this.writeQueue.enqueue(cmd);
return streamChannelFuture;
}The command runs inside the EventLoop, eliminating the need for syncUninterruptibly.
Lock Contention in isAvailable
JFR showed heavy synchronized blocks in sun.nio.ch.SocketChannelImpl.isConnected, which are invoked from TripleInvoker.isAvailable. The contention stems from many threads calling isAvailable concurrently.
Fix: replace the synchronized check with a cached boolean flag that indicates availability, removing the lock.
Thread‑Park Events and Thread‑Pool Utilization
JFR analysis uncovered a large number of ThreadPark events, indicating many consumer‑pool threads idle without work. By wrapping the consumer pool with a SerializingExecutor, task parallelism is reduced, thread‑park events drop, and overall throughput improves ~13%.
Batch Write Optimization
gRPC achieves high throughput by batching writes in a shared WriteQueue. Triple’s original design created a separate WriteQueue per stream, causing each request to flush immediately. The fix shares a single WriteQueue across all streams, allowing batch flushing:
private void flush() {
while ((cmd = queue.poll()) != null) {
cmd.run(channel);
if (++i == DEQUE_CHUNK_SIZE) {
channel.flush();
}
}
if (i != 0) {
channel.flush();
}
}After consolidating the queue, I/O calls drop dramatically and latency improves.
Results
Performance testing after applying all optimizations shows up to 45% latency reduction for small payloads, while larger payloads see modest gains, highlighting future work on large‑message handling.
Conclusion and Next Steps
The deep dive demonstrates how systematic profiling (VisualVM, JFR) combined with targeted async refactoring, lock removal, thread‑pool tuning, and batch I/O can dramatically improve Dubbo3 Triple protocol performance. The upcoming article will explore usability, interoperability, and multi‑language support (Java, Go, Rust, Node.js).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
