How Kafka Guarantees Zero Message Loss: Replication, ACKs, and Transactions
This article explains Kafka’s core mechanisms for preventing data loss, covering replica architecture, leader‑follower roles, ACK policies with ISR, log persistence and flush strategies, as well as producer idempotence and transactional support that together ensure reliable, exactly‑once delivery.
Replica Mechanism
Kafka replicates each partition across a configurable replication.factor number of brokers. For every partition one replica is elected as the leader and the remaining replicas act as followers . The leader handles all produce and fetch requests. When a producer sends a record, the leader appends it to its local log segment and then forwards the record to each follower. Followers write the record to their own log and acknowledge receipt. Once the required acknowledgments are satisfied, the leader marks the record as committed. If the leader crashes, Kafka automatically promotes one of the in‑sync replicas (ISR) to become the new leader, preserving availability and preventing data loss.
ACK Strategy and In‑Sync Replicas (ISR)
Producers can control durability with the acks configuration: acks=0 – the producer does not wait for any acknowledgment. acks=1 – the producer waits only for the leader’s acknowledgment. acks=all (or -1) – the producer waits until all replicas that are currently in the ISR have acknowledged the record.
The ISR list contains only those follower replicas that are fully caught up with the leader (i.e., their log end offset is within replica.lag.time.max.ms of the leader). Using acks=all together with a non‑zero min.insync.replicas setting guarantees that a record is considered committed only after it has been persisted on a configurable minimum number of replicas, dramatically reducing the probability of losing a confirmed message.
Persistence and Flush Strategy
Kafka stores records in immutable segment files on disk. Writes are first placed in the operating system page cache and later flushed to the underlying storage according to two configurable policies:
Time‑based flush – controlled by log.flush.interval.ms (default 300 000 ms). The broker forces an fsync after this interval.
Size‑based flush – controlled by log.flush.interval.bytes. When the amount of data written since the last flush exceeds this threshold, an fsync is performed.
Broker configuration log.flush.scheduler.interval.ms determines how frequently the background thread checks the flush conditions. Enabling unclean.leader.election.enable=false ensures that only in‑sync replicas can become leaders, preserving consistency after a crash. Producers can also enable enable.idempotence=true or use transactions to guarantee exactly‑once semantics, which internally forces the broker to write records to the log before acknowledging them.
Idempotence and Transactions
Kafka’s producer‑side idempotence assigns a monotonically increasing sequence number to each record per producerId. The broker deduplicates records that arrive with the same sequence number, preventing duplicate writes caused by retries.
Building on idempotence, the transactional API allows a group of writes to multiple partitions to be committed or aborted atomically. The workflow is:
Initialize a transactional producer with transactional.id and call initTransactions().
Begin a transaction with beginTransaction().
Produce records to any number of partitions.
Commit the transaction with commitTransaction() or abort with abortTransaction().
When a transaction is committed, all participating partitions write a transaction marker that makes the records visible to consumers that have isolation.level=read_committed. Combined with consumer offset commits inside the same transaction, Kafka can provide “exactly‑once” processing semantics, eliminating both lost and duplicate processing.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
