How to Seamlessly Migrate Dubbo Services from Zookeeper to Nacos with Service Mesh
This article explains why a company using Dubbo with Zookeeper should adopt cloud‑native practices, compares Zookeeper‑based discovery with Kubernetes DNS, and provides a detailed migration plan to Nacos—including a dynamic registry, the nacosSync tool, and extensive performance optimizations.
Technical Selection
The company's RPC framework is dubbo and it has traditionally used zookeeper for service discovery. The motivation to replace Zookeeper is not performance but the desire to move toward a cloud‑native architecture.
What Is Cloud‑Native?
Cloud‑native technologies help organizations build and run elastic applications in public, private, and hybrid clouds. Representative technologies include containers, service mesh, microservices, immutable infrastructure, and declarative APIs, enabling fault‑tolerant, manageable, and observable loosely‑coupled systems.
Service Mesh Overview
Service mesh is the TCP protocol of the microservice era.
Service mesh abstracts the underlying network, allowing service governance (rate limiting, circuit breaking, monitoring, discovery, load balancing, tracing, etc.) to be offloaded to mesh proxies, separating infrastructure from business logic.
Dubbo vs. Cloud‑Native Service Discovery
Dubbo consists of three components: provider, consumer, and registry. Providers register IP, port, service name, and method name to the registry; consumers look up these details to invoke remote calls.
In cloud‑native environments, service discovery is container‑orchestrated, typically using Kubernetes ( k8s) and DNS. This makes migrating Dubbo services to a cloud‑native stack challenging.
After evaluating options, nacos was chosen because it supports both traditional registration (like Zookeeper) and a DNS‑filter plugin (DNS‑F) that intercepts DNS queries and returns registered IPs when available.
Service Mesh Access Policies
Outside‑mesh Dubbo → Inside‑mesh Dubbo: registry
Outside‑mesh Dubbo → Outside‑mesh Dubbo: registry
Inside‑mesh Dubbo → Outside‑mesh Dubbo: domain → DNS‑F → registry
Inside‑mesh Dubbo → Inside‑mesh Dubbo: domain → DNS‑F → DNS
Heterogeneous languages (PHP, Node) call by service name, DNS‑F resolves to correct IP, load‑balancing can be tuned
Migration Plan
Two options to migrate from Zookeeper to Nacos:
Refactor Dubbo applications for dual registration (Zookeeper + Nacos) and switch after all services are updated.
Use a migration tool to sync Zookeeper data to Nacos, allowing gradual code changes.
Option 2 was chosen, supplemented by a custom dynamic registry that reads configuration at startup to decide whether to register with Zookeeper, Nacos, or both, and which registry to consume from.
Migration Tool Optimizations (nacosSync)
nacosSync acts as a Zookeeper client, pulls services, converts them to Nacos format, registers them, and watches for changes.
Single‑Direction Sync
Only Zookeeper → Nacos sync is enabled to avoid propagating potential Nacos errors back to Zookeeper.
High Availability
nacosSync is stateless, stores data in a database, and can be deployed on multiple nodes to avoid single‑point failure, though this multiplies load on Nacos servers.
Full‑Sync Support
Implemented a bulk configuration feature to handle thousands of services without manual per‑service setup.
Event Out‑of‑Order Handling
Each Dubbo registration includes a millisecond‑precision timestamp. When processing events, timestamps are compared; older events are discarded.
Active Heartbeat Detection
Periodically probes machine ports; if a node is unreachable, its Zookeeper entry is checked before removal, preventing premature service loss.
Nacos Performance Optimizations
Monitoring was enhanced to track CPU, request counts, response times, heartbeat rates, and push latency on both server and client sides.
Heartbeats dominated traffic (≈99%). Each instance sent a heartbeat every 5 seconds, leading to ~8 k QPS, doubled by two nacosSync nodes.
Adjust Heartbeat Interval
Increased interval to 10 seconds and extended offline detection timeout from 30 s to 60 s.
Scale Out
Expanded Nacos cluster from 3 to 5 nodes.
Reduce Heartbeats
Added metadata withNacos=true to services already migrated; nacosSync ignores these during Zookeeper sync, cutting redundant heartbeats.
Batch Heartbeats
Aggregated heartbeats on the client side before sending, reducing network overhead.
Long‑Lived Connections
Implemented gRPC‑based long connections for critical interfaces (registration, pulling, DNS‑F). Clients cache the responsible node after a redirect to avoid repeated lookups.
gRPC long connections slightly increased CPU but achieved near‑batch efficiency.
Key Interface Long Connections
Critical APIs (service registration, service pull) were also switched to long‑lived connections, improving overall Nacos performance.
Graceful Shutdown
Provided an instance‑level graceful shutdown API and a batch shutdown API to better integrate with internal release pipelines.
DNS‑F Improvements
Long Connections
Adopted the same gRPC approach for DNS‑F.
Invalid Dubbo Domain Names
Transformed service names like providers:com.xx.yy.zz into a DNS‑compatible form providers.com.xx.yy.zz, with DNS‑F internally mapping back.
High Availability
Monitor DNS‑F processes and restart on failure.
Deploy a centralized DNS‑F cluster as a fallback when local DNS‑F is unavailable.
Conclusion
Nacos is a relatively new open‑source component; migrating from Zookeeper introduces many challenges. This article highlighted the most critical pitfalls and the optimizations applied during the migration, offering practical guidance for similar cloud‑native transitions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
