Understanding Kafka Replication: ISR, HW, LEO, and Acknowledgement Mechanisms
This article explains Kafka's replication process, covering producer acknowledgments, replica synchronization strategies, the roles of ISR and AR, the meanings of HW, LEO, LSO, LW, different ack levels, and how failures are handled to balance reliability and performance.
To ensure that data sent by a producer reaches the intended Kafka topic reliably, each partition must acknowledge receipt before the producer proceeds; otherwise, the producer retries sending the data.
1. Replica Synchronization Strategy Kafka adopts the fully synchronous approach (all replicas must finish syncing before sending ack) because it requires fewer replicas (n+1) for n-node fault tolerance and avoids excessive data redundancy, while the additional network latency is acceptable.
2. ISR and AR After enabling the second strategy, the leader maintains a dynamic In‑Sync Replica set (ISR) containing followers that are up‑to‑date. Followers that fall behind beyond the threshold defined by replica.lag.time.max.ms are removed from ISR and placed in OSR; the full set of replicas is called Assigned Replicas (AR). ISR, AR, and OSR together determine the High Watermark (HW).
3. Acknowledgement (acks) Levels Kafka offers three reliability configurations: acks=0: Producer does not wait for any broker acknowledgment, achieving lowest latency but risking data loss on broker failure. acks=1: Producer waits for the leader to write to disk; if the leader fails before followers replicate, data may be lost. acks=-1 (or all): Producer waits until all in‑sync replicas have written the data; if the leader fails after followers have persisted but before sending ack, a retry may cause duplicate records.
4. Definitions of HW, LEO, LSO, LW
HW (High Watermark): The smallest Log End Offset among all in‑sync replicas; consumers can only read up to HW‑1.
LEO (Log End Offset): The offset of the next message to be written; each replica maintains its own LEO, and the minimum LEO across ISR equals HW.
LSO (Log Stable Offset): For unfinished transactions, it equals the first unstable offset; for completed transactions, it equals HW.
LW (Low Watermark): The smallest LogStartOffset among all replicas (AR).
5. Failure Handling Details
When a follower fails, it is temporarily removed from ISR; upon recovery it truncates any log entries beyond its last known HW and resynchronizes from the leader until its LEO catches up, after which it rejoins ISR.
If the leader fails, a new leader is elected from ISR, and remaining followers truncate logs above the new HW before synchronizing with the new leader, ensuring consistency but not guaranteeing no data loss or duplication.
6. Relationship Between ISR, HW, and LEO
Consider a partition with three replicas (one leader, two followers) where ISR contains all three and both LEO and HW are 3. After the producer writes messages 3 and 4 to the leader, followers pull these messages. If one follower catches up fully while the other lags behind, the partition's HW becomes the minimum LEO (e.g., 4), limiting consumer visibility to offsets 0‑3. Once all replicas have written the messages, HW and LEO advance together, allowing consumers to read the newly committed messages. This mechanism balances data reliability and performance, avoiding the extremes of fully synchronous or purely asynchronous replication.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
