Kafka Storage Mechanism and Reliability Guarantees
This article explains Kafka's internal storage architecture—including topics, partitions, segments, .log and .index files—how data is read, and the various reliability mechanisms such as ISR/OSR, LEO/HW, producer acknowledgment levels, leader election strategies, and delivery semantics.
Kafka Storage Architecture
Kafka organizes data by topics, each containing partitions that are replicated; each partition is stored as a directory named with the topic and partition number. Within a partition, data is split into equal‑sized segments.
Segment Structure
Each segment consists of a .log file that holds the actual message bytes and a .index file that stores offsets and positions for fast lookup. Segment filenames start at 0 and subsequent segments are named after the offset of the last message in the previous segment, using a 64‑bit, zero‑padded numeric string.
Reading Data
To read a specific offset, Kafka determines which segment contains the offset, consults the segment's index file to find the start position in the .log file, and then reads the message data according to the fixed record format.
Reliability Guarantees
Replication Lists (AR, ISR, OSR)
Kafka maintains an AR (All Replicas) list, which is the union of ISR (In‑Sync Replicas) and OSR (Out‑of‑Sync Replicas). ISR members have fully synchronized data with the leader; only when all ISR members acknowledge a write is the record considered committed.
Log End Offset (LEO) and High Watermark (HW)
LEO marks the latest offset written to the leader, while HW marks the highest offset that has been replicated to all ISR members; only data up to HW is visible to consumers.
HW Truncation Mechanism
If a leader fails, the new leader may lack some data. Followers truncate their logs to HW before synchronizing with the new leader, ensuring consistency. When the old leader recovers, it also truncates to its HW before catching up.
Producer Acknowledgment Levels
Kafka provides three ack settings via request.required.acks :
0 : Producer does not wait for any acknowledgment (highest throughput, lowest reliability).
1 : Leader acknowledges receipt; if the leader crashes, data may be lost.
-1 (or all): Leader waits for all ISR replicas to acknowledge before responding; combined with min.insync.replicas ≥ 2, this prevents data loss but can cause duplicate writes if a leader fails mid‑replication.
Leader Election Strategies
The configuration unclean.leader.election.enable controls election behavior:
false : Only replicas that are in ISR may become the new leader, guaranteeing data consistency but reducing availability.
true : Any alive replica may be elected, improving availability at the risk of data inconsistency.
Delivery Semantics
Kafka can guarantee at‑most‑once (possible loss, no duplicates), at‑least‑once (no loss, possible duplicates), and exactly‑once (requires additional deduplication logic, typically using unique identifiers such as GUIDs).
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.