Deep Dive into Nacos Distro Consistency Protocol: Design, Mechanisms, and Source Code Analysis
This article explains the design principles, six core mechanisms, asynchronous replication, periodic synchronization, new‑node sync, and local read strategy of Nacos's Distro consistency protocol, illustrating how AP/CP choices and the JRaft protocol ensure high availability in a distributed service registry.
In the previous article we introduced Nacos's overall architecture and described the flow of a registration request. This piece focuses on the consistency layer, specifically the Distro protocol used for the consistency module.
The Distro protocol is Nacos's self‑developed solution for temporary (ephemeral) service instances, providing weak consistency (AP) through asynchronous replication, while the CP path relies on the JRaft protocol for strong consistency.
Design Philosophy and Six Mechanisms
Equality Mechanism : All Nacos nodes are peers and can handle write requests.
Asynchronous Replication Mechanism : Changes are replicated to other nodes asynchronously.
Health‑Check Mechanism : Nodes periodically verify client status to keep data consistent.
Local Read Mechanism : Each node serves read requests from its local cache.
New‑Node Sync Mechanism : A newly added node pulls the full data snapshot from existing nodes.
Routing Forward Mechanism : Write requests are processed locally or forwarded to the responsible node.
Asynchronous Replication: How Writes Are Synchronized
The core entry point is the put() method in /naming/consistency/ephemeral/distro/DistroConsistencyServiceImpl.java . When a registration request arrives, three actions occur:
Store the instance in an in‑memory ConcurrentHashMap .
Enqueue a task to push the updated instance list to all clients via UDP.
Schedule a 1‑second delayed task to replicate the data to other Nacos nodes.
The replication flow adds a task to a map, which a background thread later extracts, places into a queue, and a worker thread processes by sending an HTTP request to each peer, e.g.:
http://192.168.0.101:8858/nacos/v1/ns/distro/datumHandling Sync Requests on Peer Nodes
Peers receive the request in com/alibaba/nacos/naming/controllers/DistroController.java via the onSyncDatum method, storing the datum in a ConcurrentHashMap called dataStore . Each datum contains a value (the instance list), a key, and a timestamp.
Periodic Synchronization
In version v1, nodes periodically send a checksum request to peers. If a mismatch is detected, a full data pull is triggered to reconcile differences. The request URL looks like:
http://
:port/nacos/v1/ns/distro/checksum?source=
:From version v2 onward, the explicit checksum step is removed and the health‑check mechanism maintains consistency.
New‑Node Synchronization
When a new Distro node joins, its constructor starts a task that invokes DistroProtocol.startDistroTask() and subsequently DistroLoadDataTask.run() to pull the complete snapshot of non‑persistent instances from existing nodes.
Local Read Strategy
Although each node only actively manages its own clients, every node holds the full set of instance data, allowing immediate read responses without contacting other nodes and ensuring availability even during network partitions.
Conclusion
The article walks through the Distro protocol’s architecture, source code, and key mechanisms that collectively provide AP‑style high availability for Nacos. Future articles will dive deeper into the health‑check mechanism and periodic synchronization details.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.