Big Data 13 min read

Why Kafka Messages Get Lost and How to Prevent It

This article explains the three places where Kafka can lose messages—broker, producer, and consumer—detailing the underlying causes such as page cache flushing, acknowledgment settings, buffer handling, and offset commits, and provides practical configuration and design strategies to minimize data loss and improve reliability.

Programmer DD

Dec 17, 2020

Why Kafka Messages Get Lost and How to Prevent It

Kafka can lose messages in three components: Broker, Producer, and Consumer.

Broker

Message loss at the broker occurs because Kafka stores data asynchronously in the Linux page cache for performance. Data in the page cache is flushed to disk based on time, size thresholds, or explicit sync/fsync calls. If the system crashes before flushing, the data is lost.

The flushing triggers are:

Explicitly calling sync or fsync

Low available memory

Dirty data reaching a time threshold

Kafka does not provide a synchronous flush mechanism; RocketMQ implements it by blocking until the flush completes:

GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout()); // flush

To reduce loss, adjust flush parameters such as interval and batch size, balancing performance and reliability.

The number of acknowledgments the producer requires the leader to have received before considering a request complete. Settings: acks=0 – No acknowledgment, highest throughput, no guarantee. acks=1 – Leader writes to its log and acknowledges without waiting for followers; loss can occur if the leader fails before replication. acks=all (or acks=-1 ) – Leader waits for all in‑sync replicas to acknowledge, providing the strongest durability guarantee.

Kafka also uses the min.insync.replicas setting to require a minimum number of in‑sync replicas; without it, the effective guarantee falls back to acks=1.

Producer

Producer loss happens when buffered messages are not persisted before a crash or when the buffer overflows. Kafka batches requests in memory to improve throughput, but if the process stops abruptly, the buffered data is lost.

Mitigation strategies include:

Switching from asynchronous to synchronous sends or limiting thread pool size to control production rate.

Increasing the buffer size (though this does not eliminate loss).

Writing messages to local storage (e.g., a database or file) before sending, adding an extra durable layer.

Consumer

Consumer loss can occur during message processing and offset committing. Automatic offset commits happen at fixed intervals, which may commit offsets before processing succeeds, leading to loss if processing fails.

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records)
        insertIntoDB(record); // may exceed commit interval
}

Manual offset commits ensure that offsets are only advanced after successful processing, providing at‑least‑once delivery but potentially causing duplicate consumption.

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        buffer.add(record);
    }
    if (buffer.size() >= minBatchSize) {
        insertIntoDb(buffer);
        consumer.commitSync();
        buffer.clear();
    }
}

Low‑level APIs allow explicit offset control for even finer handling.

In summary, Kafka’s reliability depends on configuring flush policies, acknowledgment levels, buffer sizes, and commit strategies; higher reliability typically reduces throughput.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Consumer Broker Message Loss acks

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.