How to Guarantee 100% Message Delivery with Kafka: Strategies and Interview Insights

This article explains why messages can be lost in Kafka, analyzes the production, storage, and consumption stages, and provides concrete configurations, detection methods, coding practices, and transactional patterns to ensure end‑to‑end reliability, especially for interview scenarios.

IT Services Circle
IT Services Circle
IT Services Circle
How to Guarantee 100% Message Delivery with Kafka: Strategies and Interview Insights

Kafka Message Storage Mechanism

Kafka stores data in topics, each divided into partitions. Each partition has multiple replicas, with one leader handling reads/writes and followers synchronizing from the leader. Replicas are spread across brokers to avoid single‑point failures.

Kafka storage diagram
Kafka storage diagram

Potential Causes of Message Loss

Production stage : Message creation and network transmission to the broker. Misconfigurations or network glitches can cause loss.

Storage stage : Broker persistence. Even with acks=all, messages may remain in page cache and be lost on power failure.

Consumption stage : Asynchronous consumption or worker crashes can drop messages before they are fully processed.

Production Stage Loss and acks Settings

The acks parameter controls reliability: acks=0 – fire‑and‑forget, highest loss risk. acks=1 – default, waits for leader acknowledgment. acks=all (or -1) – waits for all in‑sync replicas, safest but slowest.

// Synchronous send example
try {
    RecordMetadata metadata = producer.send(record).get();
    System.out.println("Sent to partition " + metadata.partition() + ", offset " + metadata.offset());
} catch (Throwable e) {
    System.out.println("Send failed, retry or log");
    e.printStackTrace();
}

Storage Stage Loss and Flush Policies

Kafka writes first to the OS page cache, then flushes to disk. Parameters to control flushing:

log.flush.interval.messages   // flush after N messages
log.flush.interval.ms        // flush after time interval
log.flush.scheduler.interval.ms // periodic flush check

Reducing these intervals improves durability but hurts performance.

Flush configuration
Flush configuration

Consumption Stage Loss and Mitigation

Typical async consumption uses a dedicated consumer thread that puts messages into an in‑memory queue (e.g., ArrayBlockingQueue) and a worker thread pool processes them. If the application crashes before workers finish, the message is lost.

Use a dedicated consumer thread to pull messages.

Use a separate worker pool to process them.

Consumer‑worker architecture
Consumer‑worker architecture

Message Loss Detection

Inject a monotonically increasing sequence number per partition into each message. Consumers verify continuity; a gap indicates loss and triggers an alert.

// Example IDs
ProducerA-Partition0-1
ProducerA-Partition0-2
...

End‑to‑End Guarantees

Combine producer safeguards, broker configurations, and consumer practices:

Set acks=all, min.insync.replicas>1, and disable unclean.leader.election.enable.

Handle send results synchronously or via callbacks, retry on failure.

In async consumption, batch pull and commit only after all workers finish.

Parameter Configuration

acks=all
min.insync.replicas=2
unclean.leader.election.enable=false

Code Robustness

Synchronous send example (shown above) and asynchronous send with callback:

producer.send(record, (metadata, exception) -> {
    if (exception != null) {
        System.out.println("Async send failed");
        exception.printStackTrace();
        // retry or compensate
    } else {
        System.out.println("Async send succeeded to partition " + metadata.partition());
    }
});

Transactional Message Guarantees

Use a local message table within a DB transaction to bind business updates and message creation. After committing the transaction, a background task attempts to send pending messages, retrying on failure.

Begin DB transaction.

Execute business logic (e.g., create user).

Insert a “pending” record into the local message table.

Commit transaction.

Background job reads pending rows, sends to Kafka, updates status or deletes on success.

Local message table flow
Local message table flow

Summary

Ensuring 100% message delivery requires a holistic approach: configure Kafka for strong durability, handle producer acknowledgments correctly, detect and compensate for storage and consumption losses, and optionally use transactional patterns or local message tables to bind business operations with message sending.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ReliabilityTransactional MessagingMessage Lossacks
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.