Kafka Deep Dive: Core Concepts Every Architect Must Master to Prevent Outages
The article explains why merely “knowing how to use” Kafka is insufficient, detailing how offset commits, consumer acknowledgments, producer acks, and rebalance behavior affect reliability, and provides concrete code examples, risk scenarios, and configuration recommendations to prevent message loss and duplicate processing in production systems.
Incident illustration
A service restart caused downstream systems to repeatedly consume historic messages, rolling back order states and overwhelming the alarm system. The failure was not a Kafka outage but a misunderstanding of Kafka’s mechanics.
Reliability factors
How offsets are committed
How consumers confirm consumption
What happens during a rebalance
How the producer’s acknowledgment (acks) setting influences data durability
Producer acknowledgment modes
acks=0 // fire‑and‑forget, no response
acks=1 // leader writes successfully, then returns
acks=all // all ISR replicas must write before returningMode trade‑offs: acks=0 gives lowest latency but no safety; acks=1 offers moderate latency but data can be lost if the leader crashes before replication; acks=all provides the highest safety at the cost of higher latency.
Payment system example
ProducerRecord<String, String> record = new ProducerRecord<>("payment-topic", orderId, payload);
producer.send(record);With acks=1, if the leader writes the record but crashes before the follower replicates, the subsequent leader election may omit the record. The producer receives a success response, yet the message is lost.
Safer producer configuration
acks=all
retries=3
enable.idempotence=trueThis configuration forces writes to all in‑sync replicas, enables automatic retries, and prevents duplicate writes.
Offset management
Kafka only tracks the offset that the application commits. The default enable.auto.commit=true makes the client commit offsets periodically, independent of business‑logic processing.
Risk scenario with automatic commits
Consumer pulls messages
Offset is auto‑committed
Business processing fails
Service restarts
Result: the committed offset is not re‑processed, so the message is lost.
Manual offset commit (production‑grade)
try {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
process(record);
}
consumer.commitSync();
} catch (Exception e) {
// do not commit offset, allow retry
}Rebalance handling
A rebalance is triggered when a new consumer joins, a consumer crashes, or topic partitions change. During rebalance the application should commit current offsets to avoid duplicate consumption.
consumer.subscribe(List.of("order-topic"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// commit current offsets
consumer.commitSync();
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// optionally adjust offsets per business needs
}
});Consumer semantics
At most once : commit offset before processing; possible message loss.
At least once : process then commit offset; possible duplicate processing.
Exactly once : use idempotent producer and transactions; no loss and no duplication.
Recommended pattern
Combine “at least once” processing with an idempotent design.
public void process(String orderId) {
if (repository.exists(orderId)) {
return;
}
repository.save(orderId);
}Project layout example
/opt/app/kafka-consumer/
├── bin/
├── config/
│ └── application.yml
├── logs/
└── lib/Unified package name:
package com.icoderoad.kafka.consumer;Illustrative diagrams
Offset commit flow:
Rebalance process diagram:
Takeaway
Kafka’s difficulty lies in mastering offset commits, rebalance handling, producer acknowledgment settings, and idempotent design. Correctly handling these mechanisms under failure conditions yields true system resilience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
LuTiao Programming
LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
