How to Guarantee 100% Message Delivery with Kafka: Interview‑Ready Strategies

This article dissects Kafka’s storage architecture, identifies loss points in production, storage, and consumption phases, and presents interview‑ready strategies—including acks settings, flush tuning, consumer batch commits, detection via sequence numbers, and transactional messaging—to guarantee virtually 100 % message durability.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Guarantee 100% Message Delivery with Kafka: Interview‑Ready Strategies

1. Kafka's Message Storage Mechanism

Kafka stores data in topics, each divided into partitions. A partition is the physical storage unit, while a topic is a logical concept. For high availability, each partition can have multiple replicas, with one leader handling all reads and writes and the others acting as followers that replicate the leader's data. Replicas are spread across different broker nodes to avoid single‑node failures.

1
1

2. Where Can Messages Be Lost?

A message can be lost at any of three critical stages: production, storage, or consumption.

Production stage : the message is created by business code and sent over the network to the MQ broker.

Storage stage : the broker receives the message and persists it.

Consumption stage : the consumer pulls the message from the broker and processes it locally.

5
5

2.1 Production Stage Loss

Kafka’s write reliability is controlled by the acks parameter, which has three levels: acks = 0: fire‑and‑forget mode; the producer does not wait for any acknowledgment, offering the highest throughput but the greatest risk of loss. acks = 1 (default): the leader replica must acknowledge the write before the producer proceeds, balancing performance and reliability. acks = all (or -1): all in‑sync replicas must acknowledge the write, providing the strongest durability guarantee at the cost of performance.

Would setting acks to -1 guarantee that no messages are lost?

2.2 Storage Stage Loss

Kafka first writes messages to the operating system’s page cache and later flushes them to disk asynchronously. Even with acks = all, if a power outage occurs before the data is flushed, the messages residing only in memory can be lost.

log.flush.interval.messages   // flush after this many messages
log.flush.interval.ms        // flush after this many milliseconds
log.flush.scheduler.interval.ms // periodic flush check

2.3 Consumption Stage Loss

In asynchronous consumption, a dedicated consumer thread pulls messages into an in‑memory queue (e.g., ArrayBlockingQueue) and hands them to a worker thread pool. If the application crashes before the workers finish processing, the in‑flight messages are lost.

9
9

2.4 Message Loss Detection

By leveraging the ordered nature of a single partition, producers can embed a monotonically increasing sequence number in each message. Consumers verify that sequence numbers are continuous; any gap indicates a lost message.

Partition‑level ordering ensures detection works per partition.

When multiple producers write to the same partition, each should maintain its own sequence space to avoid coordination complexity.

3. How to Ensure No Message Loss

3.1 Parameter Configuration

acks = all

: require all replicas to acknowledge. min.insync.replicas = 2 (or higher): at least two in‑sync replicas must be available for a successful write. unclean.leader.election.enable = false: prevent election of out‑of‑sync leaders that could cause data loss.

3.2 Code Robustness

Synchronous send example (Java):

// Kafka synchronous send example
try {
    // send returns a Future; get() blocks until the result is available
    RecordMetadata metadata = producer.send(record).get();
    System.out.println("Message sent successfully, partition: " + metadata.partition() + ", offset: " + metadata.offset());
} catch (Throwable e) {
    System.out.println("Message send failed, retry or log");
    System.out.println(e);
}

Asynchronous send example (Java):

producer.send(record, (metadata, exception) -> {
    if (exception != null) {
        System.out.println("Message send failed, handle it!");
        System.out.println(exception);
    } else {
        System.out.println("Async send success, partition: " + metadata.partition() + ", offset: " + metadata.offset());
    }
});

3.3 Consumer Commit

Use batch commit: the consumer pulls a batch (e.g., 100 messages), processes them via a worker pool, and only after all messages are successfully handled does it commit the offsets, ensuring no message is considered consumed before processing completes.

7
7

3.4 Transactional Messaging

Implement a local message table within the same database transaction as the business operation:

Begin a database transaction.

Execute the business operation (e.g., create a user).

Insert a record into a “local message” table with status “pending”.

Commit the transaction.

Immediately attempt to send the message to Kafka.

On success, update the record status to “sent” or delete it.

On failure, a background compensation task periodically retries pending messages and alerts after a retry threshold.

8
8

4. Summary

Achieving near‑100 % reliability requires a full‑stack approach: configure producers with acks=all, min.insync.replicas and disable unclean leader election; tune broker flush settings; adopt consumer batch commits; and employ transactional patterns such as a local message table with retry mechanisms to bind business operations and message delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaReliabilityTransactional MessagingacksConsumer Commitmessage loss prevention
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.