Backend Development 8 min read

Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies

Message queues can cause duplicate messages that affect idempotent business processes, so this article explains the three delivery semantics (At Least Once, Exactly Once, At Most Once), the causes of duplication, and practical deduplication techniques for producers, brokers (Kafka, Pulsar), and consumers using code examples.

IT Services Circle
IT Services Circle
IT Services Circle
Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies

When using message queues, duplicate messages can impact business scenarios that require idempotency, such as order processing, payment, and reconciliation. This article discusses how to handle duplicate messages.

1. Three Delivery Semantics

Message queues provide three semantics:

At Least Once : Guarantees that a message is not lost and is consumed at least once, but duplicates may occur.

Exactly Once : Guarantees a message is consumed precisely once, without loss or duplication; this is the hardest to achieve and often requires transactional messaging.

At Most Once : Guarantees no duplicate consumption, but messages may be lost.

The appropriate semantic depends on the use case; for example, log collection can tolerate At Most Once, while payment systems require Exactly Once.

2. Causes of Message Duplication

Duplication can happen when the producer sends a message, the broker stores it successfully, but the ACK is not returned to the producer, causing the producer to retry and resend the same message. It can also occur when the consumer acknowledges a message failure, leading the broker to resend the same offset.

3. Producer‑Side Deduplication

Some middleware, such as Kafka (starting from version 0.11.0), supports an idempotent producer. Enabling idempotence is as simple as setting a configuration flag:

Properties props = new Properties();
// ... other configurations ...
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
KafkaProducer<String, String> producer = new KafkaProducer<>(props);

Kafka achieves this by assigning each producer a unique Producer ID (PID) and a monotonically increasing Sequence Number. The broker uses the <PID, SequenceNumber> pair to identify a message uniquely; if a duplicate pair is detected, the broker discards the message. This mechanism works only within a single partition.

4. Broker‑Side Deduplication (Pulsar)

Pulsar can enable deduplication via the BrokerDeduplicationEnabled parameter. Producers include a sequenceId field; the broker tracks the highest sequenceId per ProducerName . When a message arrives, the broker compares its sequenceId with the stored highest value: if greater, the message is stored and the highest value updated; otherwise, the message is dropped and the producer receives a -1:-1 response. The article outlines three edge cases (producer disconnect, producer crash, both producer and broker crash) and how the broker recovers the highest sequence ID.

5. Consumer‑Side Deduplication

Because producer and broker deduplication only work at the Topic/Partition level, additional consumer‑side deduplication is often needed. A common approach is to embed a globally unique ID in the message payload and, on consumption, store this ID in a database with a unique index or in Redis using SETNX to ensure the message is processed only once.

if (jedis.setnx(ID, "1") == 1) {
    // process business logic and acknowledge
} else {
    // duplicate detected, acknowledge without processing
}

Using a unique index in relational databases may have limitations (e.g., lack of Change Buffer in MySQL, inability to deduplicate non‑insert operations), so Redis provides a lightweight alternative.

6. Summary

Duplicate messages are a reality in many message‑queue systems. While mainstream queues offer some deduplication capabilities, they are not foolproof. For scenarios sensitive to duplication, it is best to implement deduplication at the consumer level, leveraging business‑level identifiers and appropriate storage mechanisms.

KafkaDeduplicationMessage QueueidempotencePulsar
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.