Cloud Native 9 min read

Why Nacos Beats Zookeeper: Understanding Its Distro Protocol and Consistency Model

This article introduces Nacos as a cloud-native service discovery and configuration platform, compares its features with Zookeeper, explains the CAP theorem and consistency protocols, details the proprietary Distro protocol, and provides a practical evaluation of its performance and limitations.

Xiao Lou's Tech Notes

Apr 19, 2020

Why Nacos Beats Zookeeper: Understanding Its Distro Protocol and Consistency Model

Nacos Introduction

Nacos describes itself on its official website as a dynamic service discovery, configuration management, and service management platform that makes building cloud-native applications easier.

A platform that simplifies building cloud-native applications with dynamic service discovery, configuration management, and service management.

It can serve as a service registry and also manage configurations. As a registry it emphasizes “easier building of cloud-native applications”. Compared with Zookeeper, Nacos offers additional capabilities.

Feature : Consistency protocol – Zookeeper uses CP, Nacos supports CP + AP.

Health check : Zookeeper uses KeepAlive, Nacos supports TCP/HTTP/MySQL/client heartbeat.

Load balancing : Zookeeper none, Nacos supports weight/selector/metadata.

Multi-data-center : Zookeeper not supported, Nacos supported.

Cross-registry sync : Zookeeper not supported, Nacos supported.

Snowball protection : Zookeeper none, Nacos has it.

Access protocol : Zookeeper TCP, Nacos HTTP/DNS.

K8s integration : Zookeeper not supported, Nacos supported.

Dubbo integration : Both supported.

Consistency Protocol

Before discussing consistency protocols, understand the CAP theorem:

Consistency : All replicas see the same data at the same time.

Availability : Every request receives a response, regardless of the state of some replicas.

Partition tolerance : The system continues to operate despite network partitions; a choice must be made between consistency and availability when a partition occurs.

Distributed systems must tolerate partitions; they can choose either consistency or availability. Choosing consistency means a write must be replicated to other nodes before responding, making the service temporarily unavailable. Choosing availability sacrifices consistency. As a registry, both C and A need to be balanced. Common strong-consistency protocols such as Paxos and Raft are CP. Nacos uses a “distro” protocol that is AP, i.e., eventually consistent. For a deeper discussion see Alibaba’s blog “Why Alibaba does not use Zookeeper”.

Distro Protocol Overview

Public information on the distro protocol is scarce because it is an Alibaba-created protocol. The key points derived from the source code are:

The protocol is designed specifically for a service registry.

Clients interact with the server for service registration and heartbeat transmission.

Clients register services by dimension; after registration they periodically send heartbeats containing full service information. From the client’s perspective, server nodes are peers, so requests are sent to random nodes.

If a client request fails, it retries with another node.

Server nodes store all data but each node is responsible for a subset of services. Upon receiving a write request (register, heartbeat, deregister, etc.), a node processes it if it is responsible; otherwise it forwards it to the responsible node.

Each server actively sends health checks to other nodes; nodes that respond are considered healthy.

If a server receives a heartbeat for a non‑existent service, it treats the heartbeat as a registration request.

If a server does not receive a heartbeat from a client for a long time, it deregisters that service.

The responsible node writes data locally and returns immediately, then asynchronously propagates the data to other nodes.

Read requests are served directly from the local node, regardless of whether the data is the latest.

Special scenarios help illustrate the behavior:

If a service stops sending heartbeats and is removed, it will be re‑registered when it resumes sending heartbeats.

If a server node crashes and stops responding to health checks, other nodes remove it from the healthy list and redistribute responsibilities, rebuilding full service data through client heartbeats.

In a split‑brain situation where two data centers lose network connectivity, Nacos’s AP‑style distro protocol allows each side to continue operating with its own full service set, whereas a strong‑consistency protocol would make one side unavailable.

Nacos Simple Evaluation

In version 1.2.0, communication between client and server and between servers uses HTTP with JSON payloads, which consumes significant CPU under heavy load; a persistent connection and more efficient format would improve performance.

Heartbeats are sent per instance (IP + port + service), generating a large number of requests; without redesign, this makes production use difficult.

The distro protocol assigns responsible nodes using a simple hash; when a node fails, all services are remapped, causing many movements. Using consistent hashing would limit the impact to services owned by the failed node.