Fundamentals 13 min read

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Kafka 4.0 introduces KRaft, a ZooKeeper‑free metadata layer built on the Raft consensus algorithm, detailing role transitions, leader election, log replication, controller and broker responsibilities, and fault‑tolerance mechanisms, enabling a more scalable and self‑managed architecture for large‑scale distributed streaming.

Ma Wei Says

Mar 30, 2025

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

1. Raft Algorithm

In distributed consensus, common algorithms include Paxos, Raft, and ZAB. Kafka 4.0’s KRaft mode implements Raft, which splits consensus into four sub‑problems: leader election, log replication, safety, and membership changes.

1.1 Role State Transitions and Terms

Raft defines three roles:

Leader : one per term, receives client writes, appends them as log entries, replicates logs to followers, and sends periodic heartbeats. It tracks each follower’s replication progress and the index of the last committed entry.

Follower : passive nodes that persist the leader’s log entries, redirect client writes to the current leader, and vote in elections.

Candidate : a temporary role that initiates an election when a follower times out, requests votes, and becomes leader if it obtains a majority.

Key requirements for a valid election:

Log of the candidate must be at least as up‑to‑date as the voter’s log.

Each node may vote at most once per term.

A leader with a higher term must be accepted immediately.

Possible election outcomes are: a candidate wins and becomes leader, another node wins causing the candidate to revert to follower, or the election times out and retries.

1.2 Term

A term is a monotonically increasing number that identifies the period during which a particular leader holds office. The term duration equals the election time plus normal operation time.

任期时间 = 选举时间 + 正常运行时间

1.3 Leader Election Process

Followers start with random election timeouts (150‑300 ms). If a follower does not receive a heartbeat before timeout, it increments its term, becomes a candidate, votes for itself, and sends RequestVote RPCs to other nodes.

Election results:

Success : candidate receives votes from a majority and becomes leader, then begins sending heartbeats.

Existing leader discovered : candidate receives an AppendEntries from a higher‑term leader and steps down to follower.

Failure : no candidate obtains a majority; nodes wait a new randomized timeout and retry.

1.4 Log Replication

After election, the leader accepts client requests, converts each request into a log entry, and replicates it to followers via AppendEntries. Once a majority of followers have stored the entry, the leader marks it committed, applies it to its state machine, and informs followers to apply the entry as well. Followers that fall behind retry until they catch up.

2. KRaft Mode in Kafka 4.0

KRaft replaces ZooKeeper by using Raft to manage Kafka metadata. Metadata is stored in a single‑partition internal topic __cluster_metadata and changes are appended as events. Snapshots are taken based on size or interval to speed up controller restarts.

Two node types are defined:

Controller : runs the Raft protocol, participates in the quorum (recommended 3 or 5 nodes), and handles metadata management and leader election.

Broker : serves the data plane, fetches metadata from the active controller via the MetadataFetch API.

KRaft adapts Raft with three main variations:

Pull‑based model : brokers and inactive controllers pull metadata logs from the active controller instead of the leader pushing them.

Single‑partition log : all metadata resides in the __cluster_metadata partition, with the active controller as leader.

No ISR maintenance : consistency is achieved through Raft’s majority quorum rather than Kafka’s ISR mechanism.

Consistency and fault tolerance:

As long as a majority of controllers are alive, the metadata log remains consistent.

If the active controller fails, the remaining controllers elect a new leader; snapshots ensure rapid recovery without data loss.

2.1 Workflow

Cluster startup involves configuring each node’s role, generating a cluster ID, and starting controllers as followers with a default 2‑second timeout.

Leader election follows the Raft process described earlier.

Metadata log replication proceeds by the active controller appending changes, replicating them to followers, and committing after a majority acknowledges.

Broker synchronization:

Brokers periodically pull the latest metadata from the active controller.

They store it locally and apply only committed entries, ensuring consistency with the cluster.

Failure recovery:

If the active controller crashes, the remaining controllers elect a new leader.

If a broker restarts, it pulls missing log entries from the active controller to catch up.

3. Conclusion

Kafka 4.0’s KRaft mode integrates the Raft consensus algorithm into Kafka’s core, eliminating the need for ZooKeeper. By combining Raft’s leader election and log replication with Kafka’s pull‑based metadata fetching and event‑sourcing design, KRaft simplifies architecture, improves scalability, and shortens recovery times, making it a pivotal step toward modern, self‑managed streaming platforms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Kafka metadata management Raft Consensus Algorithm KRaft

Written by

Ma Wei Says

Follow me! Discussing software architecture and development, AIGC and AI Agents... Sometimes sharing insights on IT professionals' life experiences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.