How Kafka Elects Leaders and Distributes Partitions: A Deep Dive

This article explains Kafka's leader election process, partition assignment strategy, distribution policies, file layout, and the evolution of consumer offset storage, providing a comprehensive overview of how Kafka ensures reliable and efficient message handling in a distributed environment.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Kafka Elects Leaders and Distributes Partitions: A Deep Dive

Kafka first uses ZooKeeper to elect a Controller among the brokers; the Controller registers a watch on the ZooKeeper /brokers/ids node and receives notifications when a broker fails.

Leader Election

The Controller reads the ISR (in‑sync replica) list for each partition from /brokers/topics/[topic]/partitions/[partition]/state and selects one replica as the Leader. Replicas that fall behind longer than replica.lag.time.max.ms are removed from the ISR. Before version 0.10, a configurable lag‑message threshold could also trigger removal, but this was removed because it caused frequent ISR churn under load.

After a Leader is chosen, ZooKeeper is updated and the Controller sends a LeaderAndISRRequest to the affected brokers, informing them of the new Leader. Subsequent client requests for that partition are handled by the Leader, while Followers replicate messages from it.

Partition Assignment

All brokers (assume n brokers) and partitions are sorted. Partition i is assigned to broker (i mod n) as its Leader. Replica j of partition i is placed on broker ((i + j) mod n).

Partition Distribution Strategy

When a producer specifies a target partition, the record is written to that partition.

If no partition is specified but a key is provided, the key is hashed to determine the partition.

If neither partition nor key is provided, the producer cycles through partitions in a round‑robin fashion.

Partition Files

Each partition corresponds to a directory on the operating system. Inside the directory are multiple segment groups, each containing an .index, .log, and (in newer versions) a .timeindex file.

The .log file stores the actual messages, while the index files enable fast lookup. Segment files are named after the smallest offset they contain; for example, a segment named 368796.index covers offsets 368796–1105813.

Kafka leverages ordered offsets, segment files, sparse indexes, binary search, and sequential scans to achieve efficient data retrieval.

Consumer Offset Storage Evolution

Before version 0.10, consumers stored their committed offsets in ZooKeeper and periodically reported them, which could cause duplicate consumption and performance issues.

Since version 0.10, offsets are stored in the internal __consumer_offsets topic within the Kafka cluster, eliminating the need for ZooKeeper coordination.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsmessage queuesKafkaleader electionPartition Assignment
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.