How to Prevent Message Loss and Duplication in Kafka and RocketMQ
This article analyzes the root causes of message loss and duplication in modern message queues, explains how synchronous and asynchronous sending affect reliability, compares storage strategies like sync‑flush and broker clustering, and offers concrete idempotent handling techniques using database constraints and Redis.
Message loss
Message loss can occur at three points in a typical queue workflow: when the producer sends the message, while the broker stores it, and when the consumer acknowledges it.
Send loss
In synchronous sending the producer blocks until it receives an ACK from the broker. If the ACK is not received within the configured timeout the send is considered failed and must be retried. Asynchronous sending improves throughput but requires a callback to detect failures. Kafka’s asynchronous API is:
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
logger.error("Send failed", exception);
}
if (metadata != null) {
logger.info("Send succeeded");
}
}
});RabbitMQ, RocketMQ and other mainstream brokers provide equivalent callback‑based APIs.
Message storage
Even after the producer receives an ACK, the broker may crash before persisting the record to durable storage, causing loss. Two mitigation techniques are commonly used.
Sync flush
Configure the broker to perform a synchronous disk flush so that the ACK is sent only after the message is safely written to disk. In RocketMQ this is achieved by setting flushDiskType=SYNC_FLUSH.
Broker cluster
Running a single broker creates a single point of failure. A cluster replicates each message to multiple nodes; the producer can be configured to wait for acknowledgments from two or more replicas before returning success. This guarantees that the loss of any single broker does not make the message unavailable.
Message consumption
Consumers must acknowledge a message only after the processing logic has completed. If the broker does not receive the ACK it will redeliver the message. Some systems pull a message, immediately ACK it, and process the payload asynchronously. To avoid loss in that pattern the consumer must persist the payload locally (e.g., insert into a database) before sending the ACK.
Message duplication
Duplication originates from two situations: (1) the producer retries because it never received an ACK, and (2) the broker retries because it never received the consumer’s ACK. Duplicate consumption can cause severe business errors such as double payments.
No mainstream queue eliminates duplication entirely; the consumer must implement idempotent handling.
Database unique‑key constraint
If the message is stored in a relational database, define the message identifier as a unique key. An INSERT that violates the unique constraint is rejected, preventing duplicate processing. When the message is not stored directly, any business‑level attribute that uniquely identifies the event can be used as a unique key in a dedicated table.
Save consumption record in Redis
Another common pattern is to record the message ID in Redis and perform a SETNX (set if not exists) before processing. The Java example below demonstrates this approach:
ValueOperations<String, String> ops = redisTemplate.opsForValue();
Boolean first = ops.setIfAbsent(messageId, messageId);
if (first) {
// business logic
} else {
logger.error("Message already consumed, skipping, id:{}", messageId);
}Important: If processing fails after the key has been set, the key must be deleted so that a later retry can succeed.
Summary of mitigation strategies
Prevent loss: Use synchronous sends or reliable asynchronous callbacks, enable synchronous disk flush, and deploy a broker cluster with quorum‑based acknowledgments.
Handle retries: Retries are necessary for loss protection but inevitably introduce the possibility of duplicate delivery.
Prevent duplication: Apply idempotent processing on the consumer side—either via database unique constraints or a Redis SETNX guard.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
