Why Kafka Messages Get Lost and How to Prevent It

This article explains the three places where Kafka can lose messages—Broker, Producer, and Consumer—detailing the underlying mechanisms, the impact of flush and ack settings, and practical configuration and coding strategies to minimize data loss.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Why Kafka Messages Get Lost and How to Prevent It

Kafka can lose messages at three stages: the broker, the producer, and the consumer. Understanding the mechanisms and the relevant configuration options allows you to build a more reliable pipeline.

Broker Message Loss

When a broker receives a record it first writes the data to the Linux page cache. The data is flushed to disk in batches to achieve high throughput. If the machine crashes before the flush, the records are lost.

Flush is triggered by any of the following conditions:

Explicit sync or fsync calls (controlled by log.flush.interval.messages or log.flush.interval.ms).

Low‑memory pressure on the host.

The amount of dirty pages reaching a time or size threshold (controlled by log.flush.scheduler.interval.ms).

Kafka does not provide a fully synchronous flush mode, so durability is improved by reducing the flush interval or batch size, at the cost of latency and throughput.

The producer‑side acknowledgment setting acks determines how many replicas must confirm a write before the producer considers it successful:

acks=0 – no acknowledgment; highest throughput, highest loss risk.

acks=1 – the leader acknowledges after writing to its own page cache; loss can occur if the leader crashes before followers replicate.

acks=all (or -1) – the leader waits for all in‑sync replicas (ISR) to acknowledge; provides the strongest durability guarantee.

When acks=all is used, the broker configuration min.insync.replicas defines the minimum number of ISR that must be alive for the request to succeed. If the ISR count falls below this value the request fails, preventing silent data loss.

Broker write and replication flow
Broker write and replication flow

Producer Message Loss

Producers batch records in a local memory buffer before sending them to the broker. If the application crashes, the JVM exits, or the buffer is discarded (e.g., due to buffer.memory overflow or an explicit policy), those records are lost.

Typical mitigation strategies:

Use synchronous sends: producer.send(record).get() blocks until the broker acknowledges according to the configured acks.

Enable idempotence ( enable.idempotence=true) which forces acks=all, sets retries to a high value and guarantees exactly‑once delivery on the producer side.

Increase buffer.memory, batch.size and linger.ms to reduce the frequency of flushes, but remember this does not eliminate loss.

Persist records to a durable local store (disk or a lightweight DB) and have a separate thread read from that store and forward to Kafka.

Consumer Message Loss

Consumers read records and then commit offsets. Two commit modes exist:

Automatic offset committing (controlled by enable.auto.commit=true and auto.commit.interval.ms).

Manual offset committing (application calls commitSync() or commitAsync() after processing).

If a consumer crashes after processing a record but before the offset is committed, the record may be re‑processed (at‑least‑once) or, if the offset was already committed, the record is considered consumed and could be lost from the application’s perspective.

Example of automatic commit configuration (Java):

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true"); // enable auto commit
props.put("auto.commit.interval.ms", "1000"); // 1 s interval
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        // Process the record. If processing throws, the record may be lost because the offset was already auto‑committed.
        insertIntoDB(record);
    }
}

Manual commit example (ensures at‑least‑once delivery):

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false"); // disable auto commit
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        buffer.add(record);
    }
    if (buffer.size() >= minBatchSize) {
        insertIntoDb(buffer); // process batch
        consumer.commitSync(); // commit after successful processing
        buffer.clear();
    }
}

For fine‑grained control, the low‑level API allows committing offsets per partition:

try {
    while (running) {
        ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
        for (TopicPartition partition : records.partitions()) {
            List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
            for (ConsumerRecord<String, String> record : partitionRecords) {
                System.out.println(record.offset() + ": " + record.value());
            }
            long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
            // Commit the offset of the last processed record for this partition
            consumer.commitSync(Collections.singletonMap(
                partition, new OffsetAndMetadata(lastOffset + 1)));
        }
    }
} finally {
    consumer.close();
}
Broker write and replication diagram
Broker write and replication diagram

Practical Recommendations

Set acks=all and configure min.insync.replicas (typically 2) to guarantee that at least one replica remains after a failure.

Reduce flush latency by tuning log.flush.interval.ms (e.g., 100 ms) or log.flush.interval.messages (e.g., 1 000 messages) for critical topics.

Enable producer idempotence ( enable.idempotence=true) and configure reasonable retries (e.g., 5–10) and retry.backoff.ms to handle transient broker failures.

Prefer synchronous sends or use a durable local buffer when loss is unacceptable.

Use manual offset commits for critical consumers and make processing logic idempotent to tolerate re‑processing.

By configuring the broker flush behavior, choosing appropriate producer acknowledgment and idempotence settings, and managing consumer offset commits deliberately, you can significantly reduce the risk of losing Kafka messages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ConfigurationKafkaConsumerBrokerProducerMessage LossAck Settings
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.