Backend Development 18 min read

How Vivo Supercharged Dubbo Routing with Async Caching and Load‑Balancing Tweaks

This article details how Vivo identified heavy CPU usage in Dubbo's routing and load‑balancing modules, applied asynchronous processing, cache‑based routing, disabled unused routers, and introduced grouping strategies, resulting in over 100% TPS gains and significant CPU reductions for large service clusters.

Sohu Tech Products

Nov 1, 2023

How Vivo Supercharged Dubbo Routing with Async Caching and Load‑Balancing Tweaks

Overview

Vivo uses a customized Apache Dubbo 2.7.x stack for its massive micro‑service clusters. When the provider count exceeds 100, routing and load‑balancing consume up to 30% of CPU, severely impacting business latency. The team analyzed flame‑graphs, identified bottlenecks, and implemented a series of optimizations.

Background

Dubbo’s client call flow involves a ClusterInvoker that obtains a service list from a Directory, passes it through a chain of Router objects, and finally selects an Invoker via a LoadBalance strategy. The core routing classes are RouterChain, RouterFactory, and Router. The default load‑balancer is random, and routing follows a simple responsibility‑chain pattern supporting near‑by, tag, and conditional routing.

Problem Analysis

Flame‑graph profiling showed that the getWeight method in the random load‑balancer and the route method of various routers dominate CPU usage, each with O(n) complexity where n is the number of providers. As the provider list grows, the repeated traversal across all routers leads to excessive CPU consumption.

Optimization Strategies

1. Routing Optimizations

Disable unused tag routers by setting dubbo.consumer.router=-tag or using annotation/XML configuration.

Cache routing results: pre‑compute and store the output of stable routing strategies (e.g., near‑by routing keyed by data‑center) because the provider list changes only on deployment or configuration updates.

Introduce an epoch value to ensure cached results are consistent with the latest provider snapshot.

Use a BitMap to represent cached provider sets, allowing fast intersection of multiple routing results.

2. Load‑Balancing Optimizations

Refine getWeight to avoid unnecessary registry‑service checks unless the invoker is a ClusterInvoker.

Group large provider sets into smaller virtual groups and randomly select one group before applying the load‑balancer, dramatically reducing the number of candidates.

3. Implementation Highlights

Key code snippets include the modified RouterChain fields, the cache‑aware getNearestInvokersWithCache method, and the grouping logic in doGroup. The cache refresh logic listens to registry or dynamic‑config changes, rebuilds the BitList, and notifies downstream routers.

Performance Results

Benchmarks with provider counts of 100, 500, 1 000, 2 000, and 5 000 under ~1 000 TPS showed that the optimized version consistently outperformed the baseline. For provider counts above 2 000, TPS increased by more than 100% and average CPU usage dropped by ~27%. The routing and load‑balancing CPU share also fell dramatically.

Conclusion

By disabling unnecessary routers, adding asynchronous cache calculations, and introducing grouping, Vivo reduced Dubbo’s routing and load‑balancing CPU overhead and doubled throughput for large clusters. Future work includes adopting Dubbo 3.2’s adaptive load‑balancer and a CPU‑aware balancing algorithm to further smooth resource utilization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java Performance Optimization Cache load balancing Dubbo Routing

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.