Handling Duplicate Messages in Message Queues
Message queues can deliver duplicates under at‑least‑once semantics, so to protect idempotent business logic such as orders and payments you should combine producer‑side idempotence (e.g., Kafka’s enable.idempotence), broker‑side deduplication (e.g., Pulsar), and a consumer‑side guard using unique IDs stored in a database or Redis.
When using message queues, duplicate messages can affect business logic, especially in idempotent‑required scenarios such as orders, payments, and reconciliation.
This article discusses how to handle duplicate messages.
1. Three delivery semantics
The common semantics are:
At Least Once – messages are delivered at least once; duplicates may occur.
Exactly Once – messages are delivered exactly once without loss or duplication (hardest to achieve, usually requires transactional messaging).
At Most Once – messages are never duplicated but may be lost.
Different scenarios require different semantics; for example, logging can tolerate At Most Once, while payment requires Exactly Once.
2. Causes of duplicate messages
Duplicates can happen when the producer does not receive an ACK from the broker and retries, or when the consumer fails to acknowledge and the broker resends the same offset.
3. Producer‑side deduplication
Kafka supports an idempotent producer since version 0.11.0. Enabling it is as simple as setting enable.idempotence=true in the producer configuration.
Properties props = new Properties();
// ... other configs
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
KafkaProducer
producer = new KafkaProducer<>(props);The broker uses a combination of Producer ID (PID) and Sequence Number to identify each message uniquely. If a message with the same <PID, SequenceNumber> already exists, the broker discards it.
4. Broker‑side deduplication (Pulsar)
Pulsar can enable deduplication with the brokerDeduplicationEnabled flag. The broker tracks the highest sequenceId per producer name and drops messages with a lower or equal sequenceId.
5. Consumer‑side deduplication
Because producer and broker deduplication work only at the topic/partition level, many applications need an additional consumer‑side guard. A common approach is to embed a globally unique ID in the message payload and store it in a database or Redis with a unique index or SETNX operation.
if (jedis.setnx(ID, "1") == 1) {
// process business logic, return ACK
} else {
// duplicate, return ACK without processing
}Using a unique index in MySQL can also achieve deduplication, though it has limitations such as lack of change buffer support and inability to deduplicate non‑insert operations.
6. Summary
Message queues provide some built‑in deduplication mechanisms, but they are not foolproof. For scenarios sensitive to duplicate messages, implementing deduplication at the consumer side is recommended.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.