How ICBC Scaled Dubbo Service Discovery for 20,000+ Services

This article details Industrial Bank's migration to Dubbo micro‑services, the performance and high‑availability challenges of managing over 20,000 services with Zookeeper, and the concrete optimizations—delayed subscription, multiple‑registry mode, and per‑node registration—that enabled stable, large‑scale service discovery.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How ICBC Scaled Dubbo Service Discovery for 20,000+ Services

Background and Overview

Industrial Bank (ICBC) moved from a monolithic JEE architecture to a Dubbo‑based micro‑service platform starting in 2014. After years of rollout, the platform supports more than 20,000 Dubbo service interfaces and over 700,000 provider entries, becoming a core component of the bank’s open‑platform banking system.

ICBC micro‑service architecture diagram
ICBC micro‑service architecture diagram

Key Challenges

Performance & Capacity – Online services exceed 20,000, with each registry holding over 700,000 provider nodes. Future growth targets 100,000 services and 5 million provider entries per registry.

High Availability – Any node failure must not affect 24/7 transaction processing. Version upgrades and registry updates must be transparent to the business.

Service Discovery Basics in Dubbo

Dubbo follows a standard pattern: providers register themselves, consumers subscribe and obtain a full provider list, and RPC calls are made point‑to‑point without passing through the registry. ICBC chose Zookeeper as the registry in 2014 because of its proven scalability and CP‑style strong consistency.

Dubbo service registration hierarchy
Dubbo service registration hierarchy

Within Zookeeper, each Dubbo service creates four child nodes: providers (temporary, list of providers), consumers (temporary, list of consumers), configurations (persistent, service parameters), and routers (persistent, routing rules).

Zookeeper node structure
Zookeeper node structure

Problems Observed with Zookeeper at Scale

Massive data push: when a service with 100 providers starts, each provider triggers a watch event, causing every consumer to read the full provider list 100 times (total 5,050 reads). During peak deployments this saturates network bandwidth and degrades registration performance.

Large snapshot files: the growing number of Zookeeper nodes inflates snapshot size, leading to high disk I/O and longer recovery times after failures.

Observer‑node re‑sync delays: after a leader election, Observer nodes must sync the full transaction log. If the sync is slow, client sessions may timeout, causing temporary loss of provider nodes and subsequent registration storms.

Optimization Measures

1) Delayed Subscription Updates

ICBC modified the zkclient library to introduce a short delay after a childchange event before fetching the provider list. This batches rapid changes and reduces the number of read operations.

Delayed subscription flow
Delayed subscription flow

Benchmark results: before the change each consumer received ~4.22 million provider entries during a large rollout; after a 1‑second delay the volume dropped to ~260 k, a reduction to about 5 % of the original traffic.

2) Multiple‑Registry (multiple) Mode

Dubbo’s registry‑multiple SPI was adopted and enhanced. Instead of handling each registry independently, the client merges provider data from all registries before updating its cache. This prevents a single registry failure from causing missing providers and balances load across registries.

Multiple registry merging
Multiple registry merging

Additional benefit: the merged cache halves the number of Reference objects in the JVM, saving memory.

3) Per‑Node Registration Model

ICBC back‑ported Dubbo 2.7/Dubbo 3.0 service‑discovery logic to a “per‑node registration” model, separating configuration, metadata, and registration concerns:

Configuration Center : stores dynamic node‑level parameters and persistent configurations / routers data.

Metadata Center : holds service‑to‑node mappings and method signatures.

Registration Center : only stores the mapping between node name and its IP/port.

This redesign does not affect consumer call flow; consumers resolve the service via metadata and then locate the actual endpoint via the registration center.

Per‑node registration architecture
Per‑node registration architecture

Load tests showed data stored in the registry shrank to 1.68 % of the original size, comfortably supporting 100 k services and 100 k nodes.

Future Plans

ICBC aims to contribute its enhancements back to the open‑source community, including refined RPC result handling, multi‑protocol support, and registration circuit‑breaker mechanisms. The bank is also exploring a migration path from Dubbo to service‑mesh solutions (e.g., Istio, MCP) to address SDK version upgrades and mesh‑era challenges.

ICBC invites other Dubbo practitioners to share large‑scale experiences and jointly improve enterprise adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationservice discoveryDubboZooKeeperICBC
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.