Big Data 4 min read

How Kafka Ensures High Availability with Leader‑Follower Replication

Kafka introduced a high‑availability mechanism in version 0.8 by replicating partitions across multiple brokers, designating a leader and followers, using an in‑sync replica (ISR) list to balance synchronous and asynchronous replication, and employing leader election strategies to maintain data integrity during failures.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Kafka Ensures High Availability with Leader‑Follower Replication

In early versions Kafka lacked a high‑availability mechanism, so if a broker failed, all its partitions became unavailable and could cause data loss.

When cluster size grows, the likelihood of failures increases, demanding strong HA.

Starting with version 0.8, Kafka added partition replication. Each partition can have multiple replicas distributed across brokers, ensuring data safety even if a broker crashes.

With replication, Kafka assigns one replica as the Leader and the others as Followers . Producers send messages only to the Leader, and Followers copy the messages from the Leader.

Kafka’s replication model is a hybrid of synchronous and asynchronous approaches. Fully synchronous replication guarantees safety but hurts throughput, while fully asynchronous replication offers high performance but risks data loss.

The Leader maintains an ISR (in‑sync replica) list of Followers that are up‑to‑date. When the Leader receives a new message, it forwards it to the ISR members; once they acknowledge, the Leader confirms the commit to the Producer.

Thus Kafka combines the benefits of both models: some Followers stay in sync, while others replicate asynchronously.

If the Leader fails, a new Leader is elected, preferably from the ISR to preserve data completeness. If no ISR members are available, a non‑ISR replica may be chosen, which can lead to potential data loss.

In extreme cases where all replicas fail, Kafka can either wait for an ISR member to recover (ensuring data reliability but with uncertain recovery time) or immediately promote the first alive replica (restoring availability quickly but possibly with incomplete data). This behavior can be configured to balance availability and consistency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityKafkaReplicationISR
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.