Why Kafka Needs High Availability: Deep Dive into Replication and Leader Election
This article explains why Kafka introduced High Availability in version 0.8, covering the necessity of data replication and leader election, the internal replication and ACK mechanisms, Zookeeper metadata structures, broker failover procedures, and the command‑line tools that help manage and rebalance a Kafka cluster.
Why Kafka Needs High Availability
Before version 0.8 Kafka had no HA mechanism; a single broker failure made all partitions on that broker unavailable, and permanent loss of a broker could cause data loss. As clusters grow, the probability of multiple broker failures rises, so Kafka added HA to guarantee data durability and service continuity.
Why Replication Is Required
Without replication, a broker crash blocks both producers and consumers: producers either stop after a few retry attempts (synchronous mode) or silently drop messages (asynchronous mode). Replication ensures that at least one replica remains alive, keeping the system available even when individual brokers fail.
In synchronous producer mode, the producer retries message.send.max.retries (default 3) before throwing an exception.
In asynchronous mode, the exception is logged and the producer continues, leading to potential data loss.
Why Leader Election Is Required
When a partition has multiple replicas, only one replica (the Leader) handles all reads and writes; the others (Followers) replicate from the Leader. This design simplifies consistency, reduces the number of synchronization paths, and improves performance.
Kafka HA Design Overview
Data Replication
Kafka must answer four questions: how to propagate messages, how many replicas must acknowledge before an ACK is sent to the producer, how to handle a failed replica, and how to recover a replica that comes back online.
Propagating Messages
A producer first looks up the partition Leader in Zookeeper, sends the record to the Leader, and the Leader writes it to its local log. Each follower pulls the record, writes it to its own log, and acknowledges back to the Leader. When all in‑sync replicas (ISR) have ACKed, the Leader increments the high‑watermark (HW) and replies to the producer.
ACK Requirements
Kafka tracks liveness via Zookeeper sessions and ISR membership. A follower that falls behind more than replica.lag.max.messages (default 4000) or replica.lag.time.max.ms (default 10000) is removed from ISR. Only ISR members are considered for commit, providing a balance between durability and throughput.
Leader Election Algorithm
Kafka does not use classic majority‑vote; instead it relies on the ISR set. If a Leader fails, any ISR member can become the new Leader, allowing the system to tolerate up to f failed replicas while still guaranteeing that committed messages are not lost. Kafka’s algorithm resembles Microsoft’s PacificA and is more efficient for large‑scale log replication.
Broker Failover Process
Controller watches /brokers/ids in Zookeeper. When a broker disappears, the watch fires and the controller learns the current live broker list.
The controller builds a set set_p containing every partition that was hosted on the dead broker(s).
For each partition in set_p:
Read the current ISR from /brokers/topics/[topic]/partitions/[partition]/state.
Select a new Leader: if any ISR replica is alive, pick one; otherwise pick any surviving replica (potential data loss). If no replica survives, set Leader to -1.
Write the new Leader, ISR, leader_epoch and controller_epoch back to the same Zookeeper node.
Send a LeaderAndIsrRequest via RPC to the affected brokers (the controller can batch multiple requests).
Zookeeper Metadata Structures
Kafka stores cluster state in a hierarchy of Zookeeper znodes. Key paths include: /admin/preferred_replica_election – JSON schema describing the preferred replica election request. /admin/reassign_partitions – JSON schema for partition reassignment plans. /admin/delete_topics – JSON schema for topics pending deletion. /brokers/ids/[brokerId] – Information about live brokers (host, port, JMX port). /brokers/topics/[topic] – Mapping of partitions to their replica lists; the first replica is the preferred replica. /brokers/topics/[topic]/partitions/[partition]/state – Current leader, ISR, and epoch information. /controller – The current controller broker ID.
{
"fields": [
{"name": "version", "type": "int", "doc": "version id"},
{"name": "partitions", "type": {"type": "array", "items": {
"fields": [
{"name": "topic", "type": "string", "doc": "topic of the partition"},
{"name": "partition", "type": "int", "doc": "partition id"}
]
}},
"doc": "array of partitions for which preferred replica election should be triggered"
]
}Operational Tools
Kafka ships with several command‑line utilities to manage HA: $KAFKA_HOME/bin/kafka-topics.sh – Create, delete, describe topics and modify configuration parameters such as unclean.leader.election.enable, retention.ms, etc. $KAFKA_HOME/bin/kafka-replica-verification.sh – Verify that all replicas of selected topics are in sync. $KAFKA_HOME/bin/kafka-preferred-replica-election.sh – Trigger preferred‑replica leader election for partitions whose preferred replica is not the current leader. $KAFKA_HOME/bin/kafka-reassign-partitions.sh – Generate, execute, and verify partition reassignment plans (including changing replication factor). bin/kafka-run-class.sh kafka.tools.StateChangeLogMerger – Merge state‑change logs from all brokers to help diagnose leader‑election or failover issues.
These tools interact with the Zookeeper paths described above; they write JSON plans to /admin/* nodes, and the controller performs the actual state changes.
Balancing Leader Distribution
Kafka can automatically rebalance leaders when auto.leader.rebalance.enable is true. The controller periodically checks leader distribution (interval controlled by leader.imbalance.check.interval.seconds) and moves leaders to their preferred replicas if the imbalance exceeds leader.imbalance.per.broker.percentage. Manual rebalancing can also be performed with the preferred‑replica election tool.
Overall, Kafka’s HA design combines data replication, ISR‑based commit semantics, lightweight Zookeeper coordination, and a set of admin utilities to provide fault‑tolerant, high‑throughput messaging for large‑scale data pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
