Why Does Kafka Partition Lose Its Leader? A Deep Dive into Index Corruption and Recovery
This article examines a Kafka cluster failure where partition 34 could not elect a leader due to index file corruption, explains the underlying sanity‑check logic, reproduces the fault, and offers practical recovery steps and configuration recommendations to prevent data loss.
Background
On 2023‑02‑10 a Kafka 0.11 cluster reported no leader errors for topic A partition 34. The leader replica on broker0 could not elect a new leader because the ISR replica had been removed and the broker parameter unclean.leader.election.enable defaults to false. Consequently the partition became unavailable and pending messages could not be consumed.
Log Analysis
Inspection of KafkaServer.log showed repeated warnings about corrupted index files. The root cause was traced to kafka.log.OffsetIndex#sanityCheck, which validates each log segment’s index during recovery.
Index File Structure
entries – number of sparse index entries (one per batch, not per message).
lastOffset – offset of the last entry in the index.
baseOffset – base offset encoded in the index file name.
The corruption check uses the condition _entries == 0 || _lastOffset > baseOffset. If the condition is false the index is considered damaged and will be rebuilt.
Related Bugs
Apache JIRA issues KAFKA-1112, KAFKA-1554, KAFKA-4972, KAFKA-3955, KAFKA-1211, KAFKA-3919 describe similar index‑corruption problems in older Kafka versions.
Solution
Immediate actions: bring the failed broker back online, or delete the corrupted .log and .index files and restart the broker. Long‑term mitigation:
Set unclean.leader.election.enable=true to allow election from non‑ISR replicas.
Increase default.replication.factor to 3 for higher availability.
Set min.insync.replicas=2 to require at least two in‑sync replicas.
Configure producers with acks=1 (or appropriate durability settings).
Source‑Code Trace
The failure originates in kafka.log.Log#loadSegmentFiles, which calls kafka.log.LogSegment#recover. During recovery sanityCheck() validates the index; if a batch’s baseOffset is not greater than the previous lastOffset, kafka.log.OffsetIndex#append throws an exception and the broker aborts.
Inspecting Log Files
~/kafka_2.1x-0.11.x/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files {log_path}/secxxx-2/00000000000110325000.log > secxxx.log
~/kafka_2.1x-0.11.x/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files {log_path}/secxxx-2/00000000000110325000.index > secxxx-index.logComparison shows the last index entry has offset 110756715, position 182484660. The next batch (offset 110756804) attempts to append an index entry with the same base offset, causing the exception.
Fault Reproduction
Start two brokers with unclean.leader.election.enable=false.
Create topic-1 with 1 partition and replication factor 2.
Produce messages.
Stop broker1 (leader remains on broker0).
Stop broker0 and delete its log directory.
Restart broker1 – the partition is unavailable because the leader is down.
Restart broker0 – its replica is empty, causing the follower to truncate logs and lose data.
Manual Leader Reassignment
Run kafka-reassign-partitions.sh (or use Kafka‑Manager) to move the preferred leader to the surviving replica.
Edit the Zookeeper node for the partition: set leader=2, increment leader_epoch, and update the ISR list.
Restart the previously failed broker so that the new leader’s lastOffset becomes the reference point.
In the test case this recovered approximately 46 502 messages, reducing total loss.
Conclusion
The root cause is an offset mismatch during index reconstruction in Kafka 0.11, a bug fixed in later releases. Upgrading to Kafka 2.x and configuring unclean.leader.election.enable, replication factor, and min.insync.replicas prevents similar failures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
