Big Data 18 min read

When Does Kafka Lose Data? Proven Strategies to Prevent Message Loss

This article explains Kafka's message delivery semantics, identifies the exact scenarios where data can be lost in producer, broker, and consumer components, and provides concrete configuration and coding practices to ensure reliable, at‑least‑once or exactly‑once delivery in production environments.

JavaEdge
JavaEdge
JavaEdge
When Does Kafka Lose Data? Proven Strategies to Prevent Message Loss

Overview

Kafka is widely used as a core messaging backbone in internet‑scale systems. Because business logic depends on every message being processed, understanding when data loss can occur and how to prevent it is essential.

Message Delivery Semantics

at‑least‑once : a message is persisted but may be delivered multiple times.

at‑most‑once : a message may be lost but will never be duplicated.

exactly‑once : the message is persisted and delivered exactly once (requires idempotent producer and transactional writes).

Kafka defines a message as committed when the required number of in‑sync replicas (ISR) have written it to their logs and acknowledged the write.

Loss Scenarios

Producer Side

Producers batch records into RecordBatch objects and send them asynchronously. Loss occurs mainly when a record never reaches a broker.

Network failures : packet loss or connection drops prevent delivery.

Oversized messages : the broker rejects records that exceed message.max.bytes or max.request.size.

The acks setting controls durability: acks=0 – the producer assumes success; any network issue results in loss. acks=1 – only the leader’s acknowledgment is required; loss can happen if the leader crashes before ISR replication. acks=-1 or acks=all – all ISR replicas must acknowledge, providing the highest durability, though loss is still possible if the ISR shrinks to a single replica.

Since Kafka 0.11.0.0, enable.idempotence=true assigns a producer ID and sequence numbers, preventing duplicate writes on retries. Transactional writes ( transactional.id) guarantee all‑or‑nothing semantics across partitions.

Broker Side

Brokers write incoming records to the OS page cache and flush to disk asynchronously (batch flushing). If a broker crashes before the cache is flushed and a lagging follower becomes leader, unflushed records are lost.

Kafka does not provide synchronous flushing, so a single broker can lose data.

Multi‑partition, multi‑replica topology mitigates loss, but power‑failure or crash before flush can still cause loss.

Consumer Side

Consumers pull records, process them, and then commit offsets.

Auto‑commit before processing : if the consumer crashes after committing, the message is considered consumed but was never processed (at‑most‑once loss).

Manual commit after processing : if the consumer crashes before committing, the message will be re‑processed (duplicate possible, but no loss).

Prevention Solutions

Producer Configuration

Replace fire‑and‑forget sends with the callback API to detect failures:

Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback) {
    ProducerRecord<K,V> intercepted = (interceptors == null) ? record : interceptors.onSend(record);
    return doSend(intercepted, callback);
}

Key configuration parameters: acks=all (or -1) – require all ISR replicas to acknowledge. enable.idempotence=true – guarantees exactly‑once delivery on retries. retries=Integer.MAX_VALUE and retry.backoff.ms=300 – keep retrying on transient failures. max.in.flight.requests.per.connection=1 – preserves order when retries are unlimited. transactional.id (optional) – enables transactional writes for multi‑topic atomicity.

Broker Configuration

unclean.leader.election.enable=false

– prevents a lagging follower from being elected leader. replication.factor>=3 – ensures at least two replicas survive a broker failure. min.insync.replicas=2 and replication.factor = min.insync.replicas + 1 – a committed record must be stored on multiple alive replicas.

Consumer Configuration

Disable automatic offset commits and commit manually after successful processing: props.put("enable.auto.commit", "false"); Implement idempotent business logic so that re‑processing a record does not cause incorrect state.

Summary

Kafka can lose data only when a message is not committed according to the configured acks and ISR rules. By using:

Producer settings: acks=all, idempotence, generous retries, and callback handling.

Broker settings: high replication factor, min.insync.replicas > 1, and disabling unclean leader election.

Consumer settings: manual offset commits and idempotent processing.

operators can achieve near‑exactly‑once reliability in production Kafka deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaConsumerBrokerProducerData ReliabilityMessage Loss
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.