How Nacos’s Distro Protocol Ensures High‑Availability Consistency

This article explains the design, six mechanisms, and source‑code flow of Nacos’s Distro protocol, showing how asynchronous replication, periodic sync, new‑node loading, and local reads together provide AP‑style high availability and eventual consistency for service registration.

Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
How Nacos’s Distro Protocol Ensures High‑Availability Consistency

Introduction

In the previous article we covered the overall Nacos architecture and the lifecycle of a registration request. This piece dives into the consistency layer, focusing on the Distro protocol used in the consistency module.

1. Design Philosophy and Six Mechanisms of Distro

The Distro protocol is Nacos’s consistency solution for temporary instance data, combining advantages of Gossip and Eureka while optimizing them.

Gossip suffers from duplicate messages because nodes are chosen randomly, increasing network load. Distro mitigates this by assigning each node a subset of data and synchronizing only that subset.

Temporary instance data is stored in memory cache and fully synchronized when a node starts, with periodic validation.

AP (availability under partition) is the target, so Distro works only in a clustered environment and keeps the cluster functional when a node fails.

Equality mechanism : All Nacos nodes are peers and can handle write requests.

Asynchronous replication mechanism : Changes are replicated to other nodes asynchronously (key focus).

Health‑check mechanism : Nodes store partial data and periodically verify client status to maintain consistency.

Local‑read mechanism : Each node serves read requests from its own cache.

New‑node sync mechanism : New nodes pull data from existing nodes on startup.

Routing‑forward mechanism : Write requests are processed locally or forwarded to the appropriate node.

2. Asynchronous Replication: How Writes Are Propagated

2.1 Core Entry

The implementation class is

/naming/consistency/ephemeral/distro/DistroConsistencyServiceImpl.java

. The main entry method is put(), which performs three actions:

Store instance information in an in‑memory ConcurrentHashMap.

Enqueue a task to push the updated instance list to all clients via UDP.

Schedule a 1‑second delayed task to replicate the data to other Nacos nodes.

The second action concerns client‑side consistency; the third concerns inter‑node consistency, which is the focus here.

2.2 sync Method Parameters

sync()

receives a DistroKey (containing the service name key and a constant), a data‑type flag ( change), and a delay (1 s).

2.3 sync Core Logic: Adding Tasks

The process iterates over other nodes, checks if a similar task already exists in a map, merges or adds it, and a background thread later picks up tasks from the map.

2.4 sync Core Logic: Background Thread Replication

The background thread repeatedly extracts tasks from the map, places them into a queue, and a worker thread sends HTTP requests to other nodes. The request URL looks like:

http://<span>IP:port</span>/nacos/v1/ns/distro/datum

2.5 How Peer Nodes Process Sync Requests

2.5.1 Storing Registration Info

Incoming data is stored as a datum inside a dataStore (a ConcurrentHashMap). Each datum holds value (an ArrayList of client info), key, and timestamp.

2.5.2 Source Code

The handling class is com/alibaba/nacos/naming/controllers/DistroController.java, method onSyncDatum. It puts the instance info into a datum, stores it in the ConcurrentHashMap, and pushes the info to clients via UDP.

3. Periodic Sync: Maintaining Consistency

3.1 Why Periodic Sync Is Needed

In cluster mode Nacos must be highly available; each node holds all client data so any node can serve full reads, reducing load and ensuring consistency.

3.2 Periodic Metadata Check (v1)

Nodes run a heartbeat task that sends a checksum request to peers. If a mismatch is detected, the node pulls the full data snapshot.

http://<span>peer-ip:port</span>/nacos/v1/ns/distro/checksum?source=local-ip:port

3.3 Version Evolution

From v2 onward the periodic checksum is replaced by the health‑check mechanism to keep data in sync.

Client‑to‑node health checks.

Cluster‑level health checks.

4. New‑Node Sync Mechanism

4.1 Principle

A newly added Distro node performs a full data pull by polling all existing Distro nodes and fetching the complete set of temporary instance data.

4.2 Source Code

The constructor of DistroProtocol starts a load task that invokes DistroLoadDataTask.run(), which calls loadAllDataSnapshotFromRemote() to fetch the snapshot.

/nacos/core/distributed/distro/DistroProtocol.java
  startDistroTask()
    startLoadTask()
/nacos/core/distributed/distro/task/load/DistroLoadDataTask.java
  run()
    load()
      loadAllDataSnapshotFromRemote()

5. Local‑Read Mechanism

5.1 Principle

Each node contains the full client dataset, allowing immediate read responses without contacting other nodes. In case of network partitions, nodes still return data (possibly stale) and later converge via health checks.

Conclusion

The article walks through the Distro protocol’s architecture, six mechanisms, and source‑code flow, illustrating how they collectively provide AP‑style high availability for Nacos’s service registration. Future articles will cover Nacos’s heartbeat‑based health‑check mechanism.

distributed systemsNacosConsistencyAPCPDistro
Xiao Lou's Tech Notes
Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.