Big Data 12 min read

Understanding Kafka Message Loss: Causes and Mitigation in Broker, Producer, and Consumer

This article explains why Kafka can lose messages at the broker, producer, and consumer levels, describes the underlying mechanisms such as page cache flushing and acknowledgment settings, and provides practical code examples and mitigation strategies to improve reliability.

Architecture Digest
Architecture Digest
Architecture Digest
Understanding Kafka Message Loss: Causes and Mitigation in Broker, Producer, and Consumer

Kafka may lose messages in three components—Broker, Producer, and Consumer—due to its design for high throughput and asynchronous disk writes.

Broker side: Messages are first written to the Linux page cache and later flushed to disk based on three triggers: explicit sync/fsync calls, low memory thresholds, or when dirty data exceeds a time limit. If the system crashes before flushing, data in the page cache is lost. Kafka does not provide a synchronous flush mechanism; instead, it relies on configurable flush intervals, which can be tuned to reduce loss risk.

GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout()); // flush

Producer side: To improve performance, the producer batches requests in a local buffer and sends them asynchronously. Message durability depends on the acks setting:

acks=0 – no acknowledgment, highest throughput, highest loss risk.

acks=1 – leader acknowledges after writing to its local log; loss can occur if the leader fails before followers replicate.

acks=all (or -1) – leader waits for all in‑sync replicas to acknowledge, providing the strongest durability guarantee.

Loss can also happen if the producer process stops abruptly, the buffer overflows, or memory pressure forces message dropping. Mitigation approaches include switching to synchronous sends, limiting the producer thread pool, enlarging the buffer, or persisting messages to disk before sending.

Consumer side: Consumption involves receiving, processing, and committing offsets. With automatic offset commits, the consumer may commit an offset before processing succeeds, leading to lost messages if processing fails. Example configuration for auto‑commit:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true"); // enable auto commit
props.put("auto.commit.interval.ms", "1000"); // 1 s interval
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        insertIntoDB(record); // may take >1 s
    }
}

Switching to manual offset commits ensures at‑least‑once delivery but may cause duplicate processing. Example of manual commit with batch buffering:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false"); // disable auto commit
// ... other deserializer configs
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        buffer.add(record);
    }
    if (buffer.size() >= minBatchSize) {
        insertIntoDb(buffer);
        consumer.commitSync(); // commit after processing
        buffer.clear();
    }
}

Manual commits guarantee that messages are not lost after processing, though they require careful handling of offsets and may involve more complex low‑level APIs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaKafkaConsumerBrokerProducerACKMessage Loss
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.