How to Guarantee 100% Message Delivery with Kafka: Strategies and Interview Insights
This article explains why messages can be lost in Kafka, analyzes the production, storage, and consumption stages, and provides concrete configurations, detection methods, coding practices, and transactional patterns to ensure end‑to‑end reliability, especially for interview scenarios.
Kafka Message Storage Mechanism
Kafka stores data in topics, each divided into partitions. Each partition has multiple replicas, with one leader handling reads/writes and followers synchronizing from the leader. Replicas are spread across brokers to avoid single‑point failures.
Potential Causes of Message Loss
Production stage : Message creation and network transmission to the broker. Misconfigurations or network glitches can cause loss.
Storage stage : Broker persistence. Even with acks=all, messages may remain in page cache and be lost on power failure.
Consumption stage : Asynchronous consumption or worker crashes can drop messages before they are fully processed.
Production Stage Loss and acks Settings
The acks parameter controls reliability: acks=0 – fire‑and‑forget, highest loss risk. acks=1 – default, waits for leader acknowledgment. acks=all (or -1) – waits for all in‑sync replicas, safest but slowest.
// Synchronous send example
try {
RecordMetadata metadata = producer.send(record).get();
System.out.println("Sent to partition " + metadata.partition() + ", offset " + metadata.offset());
} catch (Throwable e) {
System.out.println("Send failed, retry or log");
e.printStackTrace();
}Storage Stage Loss and Flush Policies
Kafka writes first to the OS page cache, then flushes to disk. Parameters to control flushing:
log.flush.interval.messages // flush after N messages
log.flush.interval.ms // flush after time interval
log.flush.scheduler.interval.ms // periodic flush checkReducing these intervals improves durability but hurts performance.
Consumption Stage Loss and Mitigation
Typical async consumption uses a dedicated consumer thread that puts messages into an in‑memory queue (e.g., ArrayBlockingQueue) and a worker thread pool processes them. If the application crashes before workers finish, the message is lost.
Use a dedicated consumer thread to pull messages.
Use a separate worker pool to process them.
Message Loss Detection
Inject a monotonically increasing sequence number per partition into each message. Consumers verify continuity; a gap indicates loss and triggers an alert.
// Example IDs
ProducerA-Partition0-1
ProducerA-Partition0-2
...End‑to‑End Guarantees
Combine producer safeguards, broker configurations, and consumer practices:
Set acks=all, min.insync.replicas>1, and disable unclean.leader.election.enable.
Handle send results synchronously or via callbacks, retry on failure.
In async consumption, batch pull and commit only after all workers finish.
Parameter Configuration
acks=all
min.insync.replicas=2
unclean.leader.election.enable=falseCode Robustness
Synchronous send example (shown above) and asynchronous send with callback:
producer.send(record, (metadata, exception) -> {
if (exception != null) {
System.out.println("Async send failed");
exception.printStackTrace();
// retry or compensate
} else {
System.out.println("Async send succeeded to partition " + metadata.partition());
}
});Transactional Message Guarantees
Use a local message table within a DB transaction to bind business updates and message creation. After committing the transaction, a background task attempts to send pending messages, retrying on failure.
Begin DB transaction.
Execute business logic (e.g., create user).
Insert a “pending” record into the local message table.
Commit transaction.
Background job reads pending rows, sends to Kafka, updates status or deletes on success.
Summary
Ensuring 100% message delivery requires a holistic approach: configure Kafka for strong durability, handle producer acknowledgments correctly, detect and compensate for storage and consumption losses, and optionally use transactional patterns or local message tables to bind business operations with message sending.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
