Understanding Kafka's Transition from ZooKeeper to KRaft: Architecture, Installation, Raft Algorithm, and Common Issues
The article explains how Kafka 3.x replaces ZooKeeper with the internal KRaft consensus layer, detailing the Raft‑based metadata storage, step‑by‑step KRaft cluster installation and configuration, and covering related concepts such as leader election, consumer‑group rebalancing, reliability settings, and performance optimizations.
Kafka 3.0 removes the dependency on ZooKeeper and adopts the internal consensus mechanism KRaft (Kafka Raft). This article introduces the built‑in consensus algorithm, explains how metadata is stored without ZooKeeper, and provides a complete installation and configuration guide for a KRaft‑based Kafka cluster.
1. Kafka Core Components
Producer – sends messages to brokers.
Consumer – pulls messages from brokers.
Consumer Group – a logical subscriber that balances partitions among its members.
Broker – a server that stores topic partitions.
Topic – a logical queue; messages are written to partitions.
Partition – a unit of parallelism and scalability; each partition is ordered.
Replication – each partition has multiple replicas; one replica is the leader, the others are followers.
2. ZooKeeper Metadata (Kafka 2.x)
In the ZooKeeper‑based architecture, the following ZK nodes store critical metadata:
/admin – core internal information (e.g., deleted topics).
/brokers – broker and topic metadata.
/cluster – unique cluster ID and version.
/controller – controller election and management.
/isr_change_notification – ISR list changes.
…and several other nodes for controller epochs, leader election, etc.
These ZK paths cause operational overhead, network traffic, and strong coupling between Kafka and ZooKeeper.
3. KRaft Architecture (Kafka 3.x)
When ZooKeeper is removed, Kafka stores its metadata in an internal topic @metadata . The controller is now a regular broker that participates in a Raft quorum. Important concepts:
Process.Roles – defines the node role(s): Broker , Controller , or both.
controller.quorum.voters – list of broker IDs that form the Raft quorum.
Metadata is replicated using the Raft algorithm, providing strong consistency without ZooKeeper.
Key benefits include faster controller elections, reduced network overhead, and a single source of truth for metadata.
4. Installation & Configuration (KRaft)
Download and extract Kafka 3.1.0:
[hadoop@bigdata01 soft]$ wget http://archive.apache.org/dist/kafka/3.1.0/kafka_2.12-3.1.0.tgz [hadoop@bigdata01 soft]$ tar -zxf kafka_2.12-3.1.0.tgz -C /opt/install/Edit config/kraft/server.properties (or broker.properties ) to set the KRaft parameters:
node.id=1 controller.quorum.voters=1@bigdata01:9093 listeners=PLAINTEXT://bigdata01:9092 advertised.listeners=PLAINTEXT://bigdata01:9092 log.dirs=/opt/install/kafka_2.12-3.1.0/kraftlogsCreate the required log directories:
mkdir -p /opt/install/kafka_2.12-3.1.0/kraftlogs mkdir -p /opt/install/kafka_2.12-3.1.0/topiclogsInitialize the storage UUID and format the KRaft log:
[hadoop@bigdata01 kafka]$ ./bin/kafka-storage.sh random-uuid YkJwr6RESgSJv-sxa1R1mA [hadoop@bigdata01 kafka]$ ./bin/kafka-storage.sh format -t YkJwr6RESgSJv-sxa1R1mA -c ./config/kraft/server.propertiesStart the broker:
[hadoop@bigdata01 kafka]$ ./bin/kafka-server-start.sh ./config/kraft/server.propertiesCreate a topic:
./bin/kafka-topics.sh --create --topic kafka_test --partitions 3 --replication-factor 2 --bootstrap-server bigdata01:9092,bigdata02:9092,bigdata03:9092Produce and consume messages using the console tools:
[hadoop@bigdata01 kafka]$ bin/kafka-console-producer.sh --bootstrap-server bigdata01:9092,bigdata02:9092,bigdata03:9092 --topic kafka_test [hadoop@bigdata02 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server bigdata01:9092,bigdata02:9092,bigdata03:9092 --topic kafka_test --from-beginning5. Viewing KRaft Metadata
KRaft stores metadata in the internal topic @metadata . Two useful tools are:
kafka-dump-log.sh --cluster-metadata-decoder – dumps raw metadata logs.
kafka-metadata-shell.sh – provides a ZK‑like CLI for metadata inspection.
Example dump command:
bin/kafka-dump-log.sh --cluster-metadata-decoder --skip-record-metadata --files /opt/install/kafka_2.12-3.1.0/topiclogs/__cluster_metadata-0/00000000000000000000.index,/opt/install/kafka_2.12-3.1.0/topiclogs/__cluster_metadata-0/00000000000000000000.log > /opt/metadata.txt6. Raft Algorithm Overview
Raft ensures consensus among the controller replicas. The algorithm defines three roles:
Leader – receives client requests, appends entries to its log, and replicates them to followers.
Follower – copies entries from the leader and applies committed entries.
Candidate – a temporary role during leader election.
Key phases:
Leader Election – nodes increment their term, vote for themselves, request votes, and become leader when a majority is reached.
Log Replication – the leader sends AppendEntries RPCs; entries are considered committed once a majority of replicas have them.
Safety – a leader can only be elected if its log is at least as up‑to‑date as any candidate’s log.
Raft terms (called terms ) are numbered sequentially; each term has at most one leader. If a leader crashes, a new election starts in the next term.
7. Consumer Groups and Rebalancing
Kafka uses consumer groups to achieve both point‑to‑point and publish‑subscribe models. Partition assignment strategies include:
Range – assigns contiguous partition ranges to consumers (may be unbalanced with many topics).
RoundRobin – distributes partitions evenly across consumers.
Sticky – tries to keep previous assignments stable to reduce rebalancing overhead.
Rebalancing is triggered when the number of consumers, topics, or partitions changes. The coordinator handles JoinGroup, SyncGroup, and Heartbeat requests to compute new assignments.
8. Reliability Settings
To avoid message loss:
Set acks=all so the leader waits for all in‑sync replicas.
Configure retries to a high value.
Use replication.factor ≥ 2 and min.insync.replicas ≥ 2.
Disable unclean.leader.election.enable to prevent a non‑in‑sync replica from becoming leader.
For consumers, disable auto‑commit ( enable.auto.commit=false ) and use manual offset commits; set auto.offset.reset=earliest to avoid missing data during rebalance.
9. Performance Reasons
Sequential I/O – Kafka appends to log files, avoiding random disk seeks.
PageCache and zero‑copy – writes go through the OS page cache; reads use sendfile to avoid copying data between kernel and user space.
Batching and compression – producers batch multiple records, and both producers and brokers can compress data to reduce network and storage usage.
This comprehensive guide demonstrates how Kafka 3.x replaces ZooKeeper with KRaft, how to install and configure a KRaft cluster, and how the underlying Raft consensus, consumer group mechanics, and reliability settings work together to provide a high‑performance, fault‑tolerant messaging system.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.