Backend Development 11 min read

Understanding Apache Kafka Replication Mechanism and Its Design Principles

This article explains Apache Kafka's replication mechanism, covering the benefits of data redundancy, the roles of leader and follower replicas, the In‑Sync Replica (ISR) concept, unclean leader election, and how these design choices affect availability, consistency, and scalability in distributed systems.

Top Architect
Top Architect
Top Architect
Understanding Apache Kafka Replication Mechanism and Its Design Principles

Replication, also known as the backup mechanism, refers to keeping identical copies of data across multiple network‑connected machines in a distributed system, providing data redundancy, high scalability, and improved data locality.

In Apache Kafka, replication currently offers only the first benefit—data redundancy for high availability and durability—while lacking the scalability and locality advantages of other systems.

Kafka defines a replica as an append‑only log for a partition; each partition can have multiple replicas, with one elected as the leader and the rest as followers. Followers do not serve client requests; they asynchronously pull data from the leader and write it to their own logs.

The leader‑based replication works as follows: the leader handles all read/write requests, while followers synchronize with the leader. If the leader fails, ZooKeeper triggers a new leader election among the followers.

Because followers do not serve reads, Kafka cannot provide horizontal read scaling or data locality improvements, but this design simplifies achieving read‑your‑writes consistency and monotonic reads.

Kafka introduces the In‑Sync Replicas (ISR) set, which includes the leader and any follower that stays within the configured replica.lag.time.max.ms (default 10 seconds) behind the leader. Followers lagging beyond this threshold are removed from ISR, though they may re‑join once they catch up.

If ISR becomes empty (e.g., the leader crashes), Kafka may perform an unclean leader election, promoting a non‑ISR replica as the new leader. This improves availability at the risk of data loss, controlled by the unclean.leader.election.enable setting.

Overall, Kafka's replication design prioritizes high availability and consistency, offering a trade‑off between data safety and read scalability.

Backenddistributed systemsKafkaReplicationISRLeader-Follower
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.