How to Prevent Message Loss, Duplicates, and Backlog in Distributed Queues
This article explains practical techniques for detecting lost messages, ensuring reliable delivery across production, storage, and consumption stages, handling duplicate deliveries with idempotent designs, managing message backlogs through performance tuning, and using transactional messages to achieve distributed transaction consistency.
1. How to Ensure No Message Loss?
1.1 Detecting Message Loss
You can leverage the ordered nature of a message queue by attaching a monotonically increasing sequence number to each message on the producer side and checking the continuity of these numbers on the consumer side; any gap indicates a lost message, and the missing sequence number identifies which message was lost.
Most queue clients support interceptor mechanisms. By injecting the sequence number in a producer interceptor and validating continuity in a consumer interceptor, loss detection can be automated.
Key Points for Distributed Systems
Systems like Kafka and RocketMQ guarantee ordering only within a partition, not across the entire topic. Therefore, producers must specify partitions and check sequence continuity per partition. When multiple producer instances exist, each should generate its own sequence numbers and include a producer identifier so consumers can validate continuity per producer. Ideally, the number of consumer instances matches the number of partitions to simplify sequence checks.
2. Ensure Reliable Delivery
A message’s journey can be divided into three stages:
2.1 Production Stage
During production, the queue client uses a request‑acknowledgment mechanism. The broker returns a confirmation after persisting the message; the producer treats this as a successful send. If the broker does not respond within a timeout, the client may retry and eventually raise an exception.
<ol><li><code>try {</code></li><li><code> producer.send(record).get();</code></li><li><code> System.out.println("Message sent successfully");</code></li><li><code>} catch (Exception e) {</code></li><li><code> System.out.println("Message send failed");</code></li><li><code> System.out.println(e);</code></li><li><code>}</code></li></ol>For asynchronous sends, the callback must verify the result:
<ol><li><code>producer.send(record, new Callback() {</code></li><li><code> @Override</code></li><li><code> public void onCompletion(RecordMetadata metadata, Exception exception) {</code></li><li><code> if (metadata != null) {</code></li><li><code> System.out.println("Message sent successfully");</code></li><li><code> } else {</code></li><li><code> System.out.println("Message send failed");</code></li><li><code> System.out.println(exception);</code></li><li><code> }</code></li><li><code> }</code></li><li><code>});</code></li></ol>2.2 Storage Stage
Under normal conditions, a healthy broker does not lose messages. However, broker crashes or server failures can cause loss.
If high reliability is required, configure broker parameters to avoid loss on crash.
For a single‑node broker, enable synchronous disk flush (e.g., flushDiskType=SYNC_FLUSH in RocketMQ) so the broker acknowledges only after persisting to disk.
In a clustered broker, configure replication to at least two nodes before acknowledging to the producer, ensuring that a single node failure does not result in loss.
2.3 Consumption Stage
Consumers use a similar acknowledgment mechanism: after pulling a message, they execute business logic and only then send an acknowledgment. If the broker does not receive the acknowledgment, it will redeliver the same message on the next poll.
When writing consumer code, delay acknowledgment until all business processing completes.
Spring Boot + RabbitMQ example:
<ol><li><code>@RabbitListener(bindings = @QueueBinding(
value = @Queue(value = "${spring.rabbitmq.listener.order.queue.name}", durable = "${spring.rabbitmq.listener.order.queue.durable}"),
exchange = @Exchange(value = "${spring.rabbitmq.listener.order.exchange.name}", durable = "${spring.rabbitmq.listener.order.exchange.durable}", type = "${spring.rabbitmq.listener.order.exchange.type}", ignoreDeclarationExceptions = "${spring.rabbitmq.listener.order.exchange.ignoreDeclarationExceptions}"),
key = "${spring.rabbitmq.listener.order.key}"))</code></li><li><code>@RabbitHandler</code></li><li><code>public void onMessage(@Payload Order order, @Headers Map<String, Object> headers, Channel channel) throws Exception {</code></li><li><code> // business logic</code></li><li><code> System.out.println("Consumer:" + order);</code></li><li><code> Long deliveryTag = (Long) headers.get(AmqpHeaders.DELIVERY_TAG);</code></li><li><code> channel.basicAck(deliveryTag, false);</code></li><li><code>}</code></li></ol>2. How to Handle Duplicate Messages?
2.1 Duplicate messages are inevitable
MQTT defines three QoS levels: “At most once” (no guarantee, possible loss), “At least once” (no loss, possible duplicates), and “Exactly once” (no loss, no duplicates). Most mainstream queues (RocketMQ, RabbitMQ, Kafka) provide “At least once” semantics, so duplicates must be handled.
2.2 Use Idempotency
Idempotent operations produce the same effect regardless of how many times they are executed with the same parameters. Combining “At least once” delivery with idempotent consumption effectively achieves “Exactly once” semantics.
2.3 Common Idempotent Design Patterns
1. Database unique constraint – Create a table with a composite unique key (e.g., transaction ID + account ID). Inserting a duplicate record fails, preventing repeated updates.
2. Conditional update – Add a precondition to the update (e.g., only add 100 CNY if the current balance equals the expected value). Subsequent executions fail the precondition.
3. Record and check (token/GUID) – Assign a globally unique ID to each message; before processing, check a store to see if the ID has already been consumed.
3. How to Deal with Message Backlog?
3.1 Optimize Performance to Avoid Backlog
Producer side – Adjust concurrency and batch size. A single‑threaded producer sending one message per 1 ms can only achieve ~1 000 msg/s; increasing batch size or parallelism scales throughput.
Consumer side – Most bottlenecks appear here. Ensure consumer processing speed exceeds producer rate. Optimize business logic, increase consumer instances, and align the number of partitions with consumer instances.
3.2 When Backlog Suddenly Increases
Identify whether the surge is due to faster production or slower consumption. Monitoring tools in most queues reveal the root cause. If production spikes (e.g., a flash sale), scale out consumers. If resources are insufficient, temporarily degrade non‑critical services to reduce load.
Also check for repeated consumer failures that cause the same message to be redelivered many times, which can exacerbate backlog.
4. Using Transactional Messages for Distributed Transactions
4.1 What Is a Distributed Transaction?
Distributed transactions aim to keep operations across multiple services atomic, consistent, isolated, and durable (ACID). In e‑commerce, order creation and subsequent cart cleanup must either both succeed or both fail.
4.2 How Message Queues Implement Distributed Transactions
In the order‑cart scenario, the order service starts a transaction, sends a “half‑message” (invisible to consumers), then executes the local DB transaction to create the order. If the DB transaction commits, the service commits the half‑message, making it visible to the cart service; otherwise, it rolls back the message.
If committing the message fails (e.g., network error), the broker will periodically query the producer for the local transaction status and commit or roll back accordingly.
4.3 RocketMQ Transaction Implementation
RocketMQ adds a transaction check mechanism. The producer implements a callback that, given a transaction ID, queries the order database to determine whether the order exists and returns success or failure. The broker uses this result to finalize the half‑message.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
