How Vivo Supercharged Dubbo Routing with Async Caching and Load‑Balancing Tweaks
This article details how Vivo identified heavy CPU usage in Dubbo's routing and load‑balancing modules, applied asynchronous processing, cache‑based routing, disabled unused routers, and introduced grouping strategies, resulting in over 100% TPS gains and significant CPU reductions for large service clusters.
Overview
Vivo uses a customized Apache Dubbo 2.7.x stack for its massive micro‑service clusters. When the provider count exceeds 100, routing and load‑balancing consume up to 30% of CPU, severely impacting business latency. The team analyzed flame‑graphs, identified bottlenecks, and implemented a series of optimizations.
Background
Dubbo’s client call flow involves a ClusterInvoker that obtains a service list from a Directory, passes it through a chain of Router objects, and finally selects an Invoker via a LoadBalance strategy. The core routing classes are RouterChain, RouterFactory, and Router. The default load‑balancer is random, and routing follows a simple responsibility‑chain pattern supporting near‑by, tag, and conditional routing.
Problem Analysis
Flame‑graph profiling showed that the getWeight method in the random load‑balancer and the route method of various routers dominate CPU usage, each with O(n) complexity where n is the number of providers. As the provider list grows, the repeated traversal across all routers leads to excessive CPU consumption.
Optimization Strategies
1. Routing Optimizations
Disable unused tag routers by setting dubbo.consumer.router=-tag or using annotation/XML configuration.
Cache routing results: pre‑compute and store the output of stable routing strategies (e.g., near‑by routing keyed by data‑center) because the provider list changes only on deployment or configuration updates.
Introduce an epoch value to ensure cached results are consistent with the latest provider snapshot.
Use a BitMap to represent cached provider sets, allowing fast intersection of multiple routing results.
2. Load‑Balancing Optimizations
Refine getWeight to avoid unnecessary registry‑service checks unless the invoker is a ClusterInvoker.
Group large provider sets into smaller virtual groups and randomly select one group before applying the load‑balancer, dramatically reducing the number of candidates.
3. Implementation Highlights
Key code snippets include the modified RouterChain fields, the cache‑aware getNearestInvokersWithCache method, and the grouping logic in doGroup. The cache refresh logic listens to registry or dynamic‑config changes, rebuilds the BitList, and notifies downstream routers.
Performance Results
Benchmarks with provider counts of 100, 500, 1 000, 2 000, and 5 000 under ~1 000 TPS showed that the optimized version consistently outperformed the baseline. For provider counts above 2 000, TPS increased by more than 100% and average CPU usage dropped by ~27%. The routing and load‑balancing CPU share also fell dramatically.
Conclusion
By disabling unnecessary routers, adding asynchronous cache calculations, and introducing grouping, Vivo reduced Dubbo’s routing and load‑balancing CPU overhead and doubled throughput for large clusters. Future work includes adopting Dubbo 3.2’s adaptive load‑balancer and a CPU‑aware balancing algorithm to further smooth resource utilization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
