Ensuring Message Reliability and Handling Duplicates in Distributed Message Queues
The article explains how to detect and prevent message loss, guarantee reliable delivery, handle duplicate consumption, manage message backlog, and implement distributed transaction messages in systems like Kafka, RocketMQ, and RabbitMQ, providing practical code examples and best‑practice guidelines.
1. How to Ensure No Message Loss?
Message loss can be detected by leveraging the ordered nature of a message queue: the producer attaches a monotonically increasing sequence number to each message, and the consumer checks the continuity of these numbers. Gaps indicate lost messages, and the missing sequence number identifies the exact lost message.
Most queue clients support interceptors; a producer interceptor can inject the sequence number before sending, while a consumer interceptor can verify continuity on receipt.
Considerations in Distributed Systems
Kafka and RocketMQ guarantee ordering only per partition, not across the whole topic. Therefore, producers must specify partitions and check sequence continuity per partition.
If multiple producer instances exist, each should generate its own sequence numbers and include a producer identifier so the consumer can validate each stream separately.
Ideally, the number of consumer instances matches the number of partitions, allowing one‑to‑one mapping for sequence checks.
2. Ensuring Reliable Message Transmission
A message lifecycle consists of three stages:
1. Production Stage : The producer creates the message and sends it to the broker.
2. Storage Stage : The broker stores the message, optionally replicating it to other nodes.
3. Consumption Stage : The consumer pulls the message from the broker.
2.1 Production Stage
Reliability is achieved through a request‑acknowledgement mechanism. The broker returns a confirmation after persisting the message; the producer treats the send as successful only after receiving this response. Retries are performed automatically on timeout, and failures are reported via return values or exceptions.
Example with Kafka (synchronous send):
try {
producer.send(record).get();
System.out.println("Message sent successfully");
} catch (Exception e) {
System.out.println("Message send failed");
e.printStackTrace();
}Asynchronous send requires handling the callback:
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (metadata != null) {
System.out.println("Message sent successfully");
} else {
System.out.println("Message send failed");
exception.printStackTrace();
}
}
});2.2 Storage Stage
Under normal operation, a healthy broker does not lose messages. However, broker crashes can cause loss unless durability is configured. For a single‑node broker, enable synchronous disk flush (e.g., flushDiskType=SYNCHRONOUS_FLUSH in RocketMQ). In a clustered broker, configure replication to at least two nodes before acknowledging the producer.
2.3 Consumption Stage
Consumers also use acknowledgements. After processing a message successfully, the consumer sends an ACK to the broker. If the broker does not receive the ACK, it will redeliver the same message on the next poll, ensuring no loss.
Spring Boot + RabbitMQ example:
@RabbitListener(bindings = @QueueBinding(
value = @Queue(value = "${spring.rabbitmq.listener.order.queue.name}", durable = "${spring.rabbitmq.listener.order.queue.durable}"),
exchange = @Exchange(value = "${spring.rabbitmq.listener.order.exchange.name}", durable = "${spring.rabbitmq.listener.order.exchange.durable}", type = "${spring.rabbitmq.listener.order.exchange.type}", ignoreDeclarationExceptions = "${spring.rabbitmq.listener.order.exchange.ignoreDeclarationExceptions}"),
key = "${spring.rabbitmq.listener.order.key}"))
public void onMessage(@Payload Order order, @Headers Map<String, Object> headers, Channel channel) throws Exception {
// business logic
System.out.println("Consumer: " + order);
// manual ACK
Long deliveryTag = (Long) headers.get(AmqpHeaders.DELIVERY_TAG);
channel.basicAck(deliveryTag, false);
}3. Summary of Production, Storage, and Consumption
1) Capture send errors and retry in the production stage. 2) Configure synchronous flush and replication to avoid loss in the storage stage. 3) Send ACK only after full business processing in the consumption stage.
2. How to Handle Duplicate Messages During Consumption?
1. Duplicate Messages Are Inevitable
MQTT defines three QoS levels, which map to most message‑queue guarantees:
At most once : No guarantee, messages may be lost.
At least once : No loss, but duplicates may appear.
Exactly once : No loss and no duplicates – the highest guarantee, rarely provided directly by queues.
2. Use Idempotency to Solve Duplicates
Make the consumer operation idempotent so that repeated executions have the same effect as a single execution. This turns an "at‑least‑once" guarantee into an effective "exactly‑once" behavior.
3. Common Idempotent Design Patterns
1) Database Unique Constraints : Create a transaction‑log table with a composite unique key (e.g., transaction‑id + account‑id). Inserting a duplicate record fails, preventing double updates. Redis SETNX can serve a similar purpose.
2) Pre‑condition Checks : Update only if a condition holds (e.g., current balance equals the expected balance). Version numbers can also be used: update only when the version matches, then increment the version.
3) Token / GUID Mechanism : Assign a globally unique ID to each message. Before processing, check a store to see if the ID has been consumed; if not, process and mark it as consumed. This requires atomic check‑and‑set, which can be complex in distributed environments.
3. Dealing With Message Backlog
Backlog occurs when any component cannot keep up with upstream traffic.
1) Optimize Performance to Prevent Backlog
Producer Optimizations : Adjust concurrency and batch size. A single‑threaded producer sending one message per 1 ms yields ~1 k messages/s; batching or parallelism scales throughput.
Consumer Optimizations : Ensure consumer throughput exceeds producer rate. Horizontal scaling (adding consumer instances) must be accompanied by increasing partition count to maintain a 1:1 mapping.
2) Reactive Measures When Backlog Appears
If production spikes (e.g., flash sale), quickly add consumer instances. If resources are limited, temporarily degrade non‑critical services to reduce inbound traffic. Also investigate repeated consumer failures that may cause a single message to block the pipeline.
4. Using Transactional Messages for Distributed Transactions
Transactional messages ensure atomicity between a local database operation and a message send.
Example: An order service creates an order record (local DB transaction) and sends a message to clear the shopping cart. Both steps must either both succeed or both fail.
1) What Is a Distributed Transaction?
ACID properties: Atomicity, Consistency, Isolation, Durability. Transactional messaging is suitable for scenarios where eventual consistency is acceptable (e.g., order creation and cart cleanup).
2) How Message Queues Implement Distributed Transactions
The producer sends a "half‑message" that is invisible to consumers until the local transaction commits. After the local DB commit, the producer tells the broker to commit or roll back the half‑message.
If the commit request fails due to network issues, the broker periodically performs a transaction‑status check ("transaction check") by invoking a user‑provided callback on the producer to determine whether to commit or roll back.
3) RocketMQ Transaction Implementation
RocketMQ adds a transaction‑check mechanism. The producer implements an interface that, given a message ID, queries the local DB to see if the corresponding order exists. The broker uses this result to finalize the half‑message.
Even if the original producer instance crashes, other instances can answer the check, preserving transaction integrity.
Overall flow:
Producer starts a transaction and sends a half‑message.
Local DB transaction creates the order.
Producer commits the transaction message if DB commit succeeded; otherwise rolls back.
If the broker does not receive the commit/rollback, it triggers a transaction check.
Summary
By combining sequence numbers, acknowledgements, idempotent consumer logic, performance tuning, and transactional messaging, distributed systems can achieve reliable, exactly‑once‑like processing even with "at‑least‑once" message‑queue guarantees.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
