Cloud Native 26 min read

Uncovering the High‑Performance Secrets of Dubbo3 Triple Protocol

This article dives deep into Dubbo3's Triple protocol, explaining its design, identifying performance bottlenecks with tools like VisualVM and JFR, and presenting concrete code‑level optimizations—including async stream creation, lock‑contention fixes, thread‑pool tuning, and batch writes—that boost throughput by up to 45% in real‑world Alibaba workloads.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Uncovering the High‑Performance Secrets of Dubbo3 Triple Protocol

Dubbo3 Triple protocol is a hybrid of gRPC, gRPC‑Web and Dubbo2, offering full gRPC compatibility, streaming support, and seamless HTTP/1 and browser access, allowing Dubbo, gRPC, curl or browser clients to invoke services without extra configuration.

Since 2021 Dubbo3 has replaced the internal HSF framework at Alibaba, handling trillion‑level service calls during Double‑11, making Triple's performance critical for overall system efficiency.

Pre‑knowledge

Triple combines features of gRPC and gRPC‑Web, supporting HTTP/1 and HTTP/2. Its core components include:

TripleInvoker : Handles UNARY, BiStream, etc., with doInvoke dispatching calls.

TripleClientStream : Maps to HTTP/2 streams, providing sendHeader and sendMessage.

WriteQueue : Buffers commands and submits them to Netty's EventLoop for ordered execution.

QueueCommand : Abstract task executed by the WriteQueue.

TripleServerStream : Server‑side counterpart for handling incoming streams.

Tooling for Performance Diagnosis

Two main tools are used:

VisualVM : Monitors CPU, threads, memory, and provides sampling to locate hot methods.

Java Flight Recorder (JFR) : Low‑overhead event recorder that captures monitor blocking, thread parking, and other runtime events.

Optimization Ideas

Eliminate blocking calls (e.g., Thread.sleep, await).

Adopt asynchronous programming (e.g., CompletableFuture).

Apply divide‑and‑conquer to split large tasks.

Batch operations to reduce I/O frequency.

Identifying a Major Blocking Point

Analysis with VisualVM revealed that syncUninterruptibly in WriteQueue.createWriteQueue blocks the user thread while waiting for an Http2StreamChannel to be created:

private WriteQueue createWriteQueue(Channel parent) {
  Http2StreamChannelBootstrap bootstrap = new Http2StreamChannelBootstrap(parent);
  Future<Http2StreamChannel> future = bootstrap.open().syncUninterruptibly();
  if (!future.isSuccess()) {
    throw new IllegalStateException("Create remote stream failed. channel:" + parent);
  }
  Http2StreamChannel channel = future.getNow();
  channel.pipeline()
    .addLast(new TripleCommandOutBoundHandler())
    .addLast(new TripleHttp2ClientResponseHandler(createTransportListener()));
  channel.closeFuture().addListener(f -> transportException(f.cause()));
  return new WriteQueue(channel);
}

The blocking occurs because the user thread submits the task to the EventLoop and then waits synchronously, causing unnecessary latency.

Async Stream Creation Fix

By converting the creation into an asynchronous command and enqueuing it, the blocking call is removed:

private TripleStreamChannelFuture initHttp2StreamChannel(Channel parent) {
  TripleStreamChannelFuture streamChannelFuture = new TripleStreamChannelFuture(parent);
  Http2StreamChannelBootstrap bootstrap = new Http2StreamChannelBootstrap(parent);
  bootstrap.handler(new ChannelInboundHandlerAdapter() {
    @Override
    public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
      Channel channel = ctx.channel();
      channel.pipeline().addLast(new TripleCommandOutBoundHandler());
      channel.pipeline().addLast(new TripleHttp2ClientResponseHandler(createTransportListener()));
      channel.closeFuture().addListener(f -> transportException(f.cause()));
    }
  });
  CreateStreamQueueCommand cmd = CreateStreamQueueCommand.create(bootstrap, streamChannelFuture);
  this.writeQueue.enqueue(cmd);
  return streamChannelFuture;
}

The command runs inside the EventLoop, eliminating the need for syncUninterruptibly.

Lock Contention in isAvailable

JFR showed heavy synchronized blocks in sun.nio.ch.SocketChannelImpl.isConnected, which are invoked from TripleInvoker.isAvailable. The contention stems from many threads calling isAvailable concurrently.

Fix: replace the synchronized check with a cached boolean flag that indicates availability, removing the lock.

Thread‑Park Events and Thread‑Pool Utilization

JFR analysis uncovered a large number of ThreadPark events, indicating many consumer‑pool threads idle without work. By wrapping the consumer pool with a SerializingExecutor, task parallelism is reduced, thread‑park events drop, and overall throughput improves ~13%.

Batch Write Optimization

gRPC achieves high throughput by batching writes in a shared WriteQueue. Triple’s original design created a separate WriteQueue per stream, causing each request to flush immediately. The fix shares a single WriteQueue across all streams, allowing batch flushing:

private void flush() {
  while ((cmd = queue.poll()) != null) {
    cmd.run(channel);
    if (++i == DEQUE_CHUNK_SIZE) {
      channel.flush();
    }
  }
  if (i != 0) {
    channel.flush();
  }
}

After consolidating the queue, I/O calls drop dramatically and latency improves.

Results

Performance testing after applying all optimizations shows up to 45% latency reduction for small payloads, while larger payloads see modest gains, highlighting future work on large‑message handling.

Conclusion and Next Steps

The deep dive demonstrates how systematic profiling (VisualVM, JFR) combined with targeted async refactoring, lock removal, thread‑pool tuning, and batch I/O can dramatically improve Dubbo3 Triple protocol performance. The upcoming article will explore usability, interoperability, and multi‑language support (Java, Go, Rust, Node.js).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javacloud nativeperformance optimizationNettytriple-protocol
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.