Understanding Kafka Idempotent Producer and How to Prevent Message Duplicates
This article explains why message duplication occurs in Kafka, describes the three delivery semantics, and provides practical solutions—including idempotent producers, transactions, and consumer-side idempotence—along with configuration tips and code examples to achieve exactly‑once delivery.
Message duplication is a common issue in distributed systems, especially when producers retry after failures or consumers re‑process uncommitted offsets after a crash.
The article first outlines three delivery semantics: at most once (possible loss, no duplicates), at least once (no loss, possible duplicates), and exactly once (no loss, no duplicates), using MQTT QoS examples.
To achieve exactly‑once semantics in Kafka, three approaches are presented:
Kafka Idempotent Producer : enable idempotence on the producer side. This requires setting enable.idempotence=true , acks=all , and max.in.flight.requests.per.connection<=5 . The article shows the necessary configuration snippet: Properties props = new Properties(); props.put("enable.idempotence", true); props.put("acks", "all"); props.put("max.in.flight.requests.per.connection", 5);
Kafka Transactions : use transactional producers to overcome the single‑partition limitation of idempotent producers. Example code includes initializing a transaction, sending records, and committing or aborting based on exceptions: producer.initTransactions(); producer.beginTransaction(); producer.send(new ProducerRecord<>("Topic", "Key", "Value")); producer.commitTransaction(); // on error: producer.abortTransaction();
Consumer‑Side Idempotence : store processed message IDs in a database table and use transactions to ensure duplicate inserts are rejected, effectively making the consumer idempotent.
The article also details how Kafka tracks producer state using a globally unique pid and per‑partition sequence numbers, and how the broker determines duplicates by comparing <pid, seqNum> against its cached entries.
Practical notes include:
If acks is set to 0 or 1 , idempotence cannot be guaranteed.
Setting max.in.flight.requests.per.connection greater than 5 will cause a configuration error.
Finally, the article provides a complete Java producer example that reads lines from a file and sends them to a Kafka topic with idempotent settings, and shows how to start the consumer from the command line.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.