Optimizing Dubbo Routing and Load Balancing at Scale: Vivo's Practice
Vivo tackled high CPU overhead in large‑scale Dubbo deployments by disabling unused routers, caching routing results with BitMap intersections and epoch validation, optimizing weight calculations, and adding a grouping router, which together delivered over 100 % TPS gains for 20 k+ providers and cut CPU usage by roughly 27 %.
This article presentsvivo's optimization practices for Apache Dubbo's routing module and load balancing in large-scale microservice deployments. The content covers the technical challenges encountered when using Dubbo (based on version 2.7.x) in production environments with hundreds of service providers, where CPU consumption in routing and load balancing reached 30% according to flame graph analysis.
Background and Problem Analysis:
The article explains Dubbo's client invocation flow: clients use local proxy to call ClusterInvoker, which retrieves service lists from Directory, applies routing chains to filter services, and uses load balancing to select an invoker for RPC calls. The routing mechanism uses a responsibility chain pattern supporting multiple routing strategies (nearest-router, tag-router, conditional router). Load balancing defaults to random selection but includes weight calculation for warm-up purposes.
Performance analysis revealed O(n) time complexity in both getWeight() method for load balancing and route() methods for each router, causing significant CPU overhead when provider count exceeds 100.
Optimization Solutions:
1. Router Optimization:
Disable unused routers (e.g., -tag to disable native application-level tag router)
Pre-calculate and cache routing results - cache full provider lists by datacenter for nearest-router, cache tag-based results for tag-router
Use BitMap for efficient intersection operations between routing results
Implement epoch-based cache invalidation to ensure consistency
2. Load Balancing Optimization:
Optimize getWeight() method by adding type checking before registry service weight lookup
Add grouping router as the final step in routing chain to reduce nodes entering load balancing - randomly select one group to proceed
Key Code Implementations:
The article provides source code for RouterChain, RouterFactory, Router interface, and concrete implementations including nearest-router with caching logic using BitList and epoch validation.
public <T> List<Invoker<T>> route(List<Invoker<T>> invokers, URL consumerUrl, Invocation invocation) throws RpcException {
BitList<Invoker<T>> bitList = (BitList<Invoker<T>>) invokers;
BitList<Invoker<T>> result = getNearestInvokersWithCache(bitList);
// ... fallback logic
}
private <T> BitList<Invoker<T>> getNearestInvokersWithCache(BitList<Invoker<T>> invokers) {
ValueWrapper valueWrapper = getCache(getSystemProperty(LOC));
if (valueWrapper != null) {
BitList<Invoker<T>> invokerBitList = (BitList<Invoker<T>>) valueWrapper.get();
if (invokers.isSameEpoch(invokerBitList)) {
BitList<Invoker<T>> tmp = invokers.clone();
return tmp.and(invokerBitList); // Intersection using BitMap
}
}
return getNearestInvokers(invokers);
}Performance Results:
Testing with 100 to 50,000 provider nodes and ~1000 TPS showed significant improvements: when provider count exceeds 20,000, TPS improvement reached over 100%, average CPU usage decreased by approximately 27%, and routing/load balancing CPU proportion was significantly reduced. The optimization effect becomes more pronounced as provider count increases.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.