When Does Kafka Lose Data? Proven Strategies to Prevent Message Loss
This article explains Kafka's message delivery semantics, identifies the exact scenarios where data can be lost in producer, broker, and consumer components, and provides concrete configuration and coding practices to ensure reliable, at‑least‑once or exactly‑once delivery in production environments.
Overview
Kafka is widely used as a core messaging backbone in internet‑scale systems. Because business logic depends on every message being processed, understanding when data loss can occur and how to prevent it is essential.
Message Delivery Semantics
at‑least‑once : a message is persisted but may be delivered multiple times.
at‑most‑once : a message may be lost but will never be duplicated.
exactly‑once : the message is persisted and delivered exactly once (requires idempotent producer and transactional writes).
Kafka defines a message as committed when the required number of in‑sync replicas (ISR) have written it to their logs and acknowledged the write.
Loss Scenarios
Producer Side
Producers batch records into RecordBatch objects and send them asynchronously. Loss occurs mainly when a record never reaches a broker.
Network failures : packet loss or connection drops prevent delivery.
Oversized messages : the broker rejects records that exceed message.max.bytes or max.request.size.
The acks setting controls durability: acks=0 – the producer assumes success; any network issue results in loss. acks=1 – only the leader’s acknowledgment is required; loss can happen if the leader crashes before ISR replication. acks=-1 or acks=all – all ISR replicas must acknowledge, providing the highest durability, though loss is still possible if the ISR shrinks to a single replica.
Since Kafka 0.11.0.0, enable.idempotence=true assigns a producer ID and sequence numbers, preventing duplicate writes on retries. Transactional writes ( transactional.id) guarantee all‑or‑nothing semantics across partitions.
Broker Side
Brokers write incoming records to the OS page cache and flush to disk asynchronously (batch flushing). If a broker crashes before the cache is flushed and a lagging follower becomes leader, unflushed records are lost.
Kafka does not provide synchronous flushing, so a single broker can lose data.
Multi‑partition, multi‑replica topology mitigates loss, but power‑failure or crash before flush can still cause loss.
Consumer Side
Consumers pull records, process them, and then commit offsets.
Auto‑commit before processing : if the consumer crashes after committing, the message is considered consumed but was never processed (at‑most‑once loss).
Manual commit after processing : if the consumer crashes before committing, the message will be re‑processed (duplicate possible, but no loss).
Prevention Solutions
Producer Configuration
Replace fire‑and‑forget sends with the callback API to detect failures:
Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback) {
ProducerRecord<K,V> intercepted = (interceptors == null) ? record : interceptors.onSend(record);
return doSend(intercepted, callback);
}Key configuration parameters: acks=all (or -1) – require all ISR replicas to acknowledge. enable.idempotence=true – guarantees exactly‑once delivery on retries. retries=Integer.MAX_VALUE and retry.backoff.ms=300 – keep retrying on transient failures. max.in.flight.requests.per.connection=1 – preserves order when retries are unlimited. transactional.id (optional) – enables transactional writes for multi‑topic atomicity.
Broker Configuration
unclean.leader.election.enable=false– prevents a lagging follower from being elected leader. replication.factor>=3 – ensures at least two replicas survive a broker failure. min.insync.replicas=2 and replication.factor = min.insync.replicas + 1 – a committed record must be stored on multiple alive replicas.
Consumer Configuration
Disable automatic offset commits and commit manually after successful processing: props.put("enable.auto.commit", "false"); Implement idempotent business logic so that re‑processing a record does not cause incorrect state.
Summary
Kafka can lose data only when a message is not committed according to the configured acks and ISR rules. By using:
Producer settings: acks=all, idempotence, generous retries, and callback handling.
Broker settings: high replication factor, min.insync.replicas > 1, and disabling unclean leader election.
Consumer settings: manual offset commits and idempotent processing.
operators can achieve near‑exactly‑once reliability in production Kafka deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
