Cloud Native 17 min read

How to Seamlessly Migrate Dubbo Services from Zookeeper to Nacos with Service Mesh

This article explains why a company using Dubbo with Zookeeper should adopt cloud‑native practices, compares Zookeeper‑based discovery with Kubernetes DNS, and provides a detailed migration plan to Nacos—including a dynamic registry, the nacosSync tool, and extensive performance optimizations.

Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
How to Seamlessly Migrate Dubbo Services from Zookeeper to Nacos with Service Mesh

Technical Selection

The company's RPC framework is dubbo and it has traditionally used zookeeper for service discovery. The motivation to replace Zookeeper is not performance but the desire to move toward a cloud‑native architecture.

What Is Cloud‑Native?

Cloud‑native technologies help organizations build and run elastic applications in public, private, and hybrid clouds. Representative technologies include containers, service mesh, microservices, immutable infrastructure, and declarative APIs, enabling fault‑tolerant, manageable, and observable loosely‑coupled systems.

Service Mesh Overview

Service mesh is the TCP protocol of the microservice era.

Service mesh abstracts the underlying network, allowing service governance (rate limiting, circuit breaking, monitoring, discovery, load balancing, tracing, etc.) to be offloaded to mesh proxies, separating infrastructure from business logic.

Dubbo vs. Cloud‑Native Service Discovery

Dubbo consists of three components: provider, consumer, and registry. Providers register IP, port, service name, and method name to the registry; consumers look up these details to invoke remote calls.

In cloud‑native environments, service discovery is container‑orchestrated, typically using Kubernetes ( k8s) and DNS. This makes migrating Dubbo services to a cloud‑native stack challenging.

After evaluating options, nacos was chosen because it supports both traditional registration (like Zookeeper) and a DNS‑filter plugin (DNS‑F) that intercepts DNS queries and returns registered IPs when available.

Service Mesh Access Policies

Outside‑mesh Dubbo → Inside‑mesh Dubbo: registry

Outside‑mesh Dubbo → Outside‑mesh Dubbo: registry

Inside‑mesh Dubbo → Outside‑mesh Dubbo: domain → DNS‑F → registry

Inside‑mesh Dubbo → Inside‑mesh Dubbo: domain → DNS‑F → DNS

Heterogeneous languages (PHP, Node) call by service name, DNS‑F resolves to correct IP, load‑balancing can be tuned

Migration Plan

Two options to migrate from Zookeeper to Nacos:

Refactor Dubbo applications for dual registration (Zookeeper + Nacos) and switch after all services are updated.

Use a migration tool to sync Zookeeper data to Nacos, allowing gradual code changes.

Option 2 was chosen, supplemented by a custom dynamic registry that reads configuration at startup to decide whether to register with Zookeeper, Nacos, or both, and which registry to consume from.

Migration Tool Optimizations (nacosSync)

nacosSync acts as a Zookeeper client, pulls services, converts them to Nacos format, registers them, and watches for changes.

Single‑Direction Sync

Only Zookeeper → Nacos sync is enabled to avoid propagating potential Nacos errors back to Zookeeper.

High Availability

nacosSync is stateless, stores data in a database, and can be deployed on multiple nodes to avoid single‑point failure, though this multiplies load on Nacos servers.

Full‑Sync Support

Implemented a bulk configuration feature to handle thousands of services without manual per‑service setup.

Event Out‑of‑Order Handling

Each Dubbo registration includes a millisecond‑precision timestamp. When processing events, timestamps are compared; older events are discarded.

Active Heartbeat Detection

Periodically probes machine ports; if a node is unreachable, its Zookeeper entry is checked before removal, preventing premature service loss.

Nacos Performance Optimizations

Monitoring was enhanced to track CPU, request counts, response times, heartbeat rates, and push latency on both server and client sides.

Heartbeats dominated traffic (≈99%). Each instance sent a heartbeat every 5 seconds, leading to ~8 k QPS, doubled by two nacosSync nodes.

Adjust Heartbeat Interval

Increased interval to 10 seconds and extended offline detection timeout from 30 s to 60 s.

Scale Out

Expanded Nacos cluster from 3 to 5 nodes.

Reduce Heartbeats

Added metadata withNacos=true to services already migrated; nacosSync ignores these during Zookeeper sync, cutting redundant heartbeats.

Batch Heartbeats

Aggregated heartbeats on the client side before sending, reducing network overhead.

Long‑Lived Connections

Implemented gRPC‑based long connections for critical interfaces (registration, pulling, DNS‑F). Clients cache the responsible node after a redirect to avoid repeated lookups.

gRPC long connections slightly increased CPU but achieved near‑batch efficiency.

Key Interface Long Connections

Critical APIs (service registration, service pull) were also switched to long‑lived connections, improving overall Nacos performance.

Graceful Shutdown

Provided an instance‑level graceful shutdown API and a batch shutdown API to better integrate with internal release pipelines.

DNS‑F Improvements

Long Connections

Adopted the same gRPC approach for DNS‑F.

Invalid Dubbo Domain Names

Transformed service names like providers:com.xx.yy.zz into a DNS‑compatible form providers.com.xx.yy.zz, with DNS‑F internally mapping back.

High Availability

Monitor DNS‑F processes and restart on failure.

Deploy a centralized DNS‑F cluster as a fallback when local DNS‑F is unavailable.

Conclusion

Nacos is a relatively new open‑source component; migrating from Zookeeper introduces many challenges. This article highlighted the most critical pitfalls and the optimizations applied during the migration, offering practical guidance for similar cloud‑native transitions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Migrationcloud-nativeZooKeeperservice mesh
Xiao Lou's Tech Notes
Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.