How to Ensure Data Consistency in Message Queues: 10 Hard‑Earned Lessons
This article explores why message queues can lose consistency, presents concrete solutions such as transactional two‑phase commits, persistence settings, replica configurations, unique IDs, idempotent designs, and dead‑letter queues, and shares ten practical lessons drawn from real‑world incidents.
Preface
Last month an e‑commerce system encountered a spooky bug: the user paid successfully but the order status never changed to "shipped". The root cause was that the order service's MQ message vanished, highlighting data‑consistency design in message queues as one of the three biggest nightmares of distributed systems.
1. Causes of Data Inconsistency
Four fatal reasons observed in Kafka, RabbitMQ and RocketMQ:
Producer tragedy : message reaches the broker but the power fails before it is flushed to disk.
Consumer tragedy : message is consumed successfully but the business logic fails.
Roulette : network jitter causes duplicate deliveries.
Data islands : database and message state become detached (e.g., order placed but coupon not issued).
These situations all lead to inconsistent data in MQ.
2. Solutions to Prevent Message Loss
2.1 Transactional Two‑Phase Commit
Using RocketMQ transactional messages, the workflow resembles a pre‑sale deposit pseudo‑code:
// Send transactional message core code
TransactionMQProducer producer = new TransactionMQProducer("group");
producer.setTransactionListener(new TransactionListener() {
// Execute local transaction (e.g., deduct inventory)
public LocalTransactionState executeLocalTransaction(Message msg, Object arg) {
return doBiz() ? LocalTransactionState.COMMIT : LocalTransactionState.ROLLBACK;
}
// Broker checks local transaction status
public LocalTransactionState checkLocalTransaction(MessageExt msg) {
return checkDB(msg.getTransactionId()) ? COMMIT : ROLLBACK;
}
});In real scenarios, remember to implement a compromise query (e.g., a transaction log table) inside checkLocalTransaction. A past incident showed that a network glitch prevented the broker from receiving the commit, causing endless retries.
2.2 Persistence Configuration
Key RabbitMQ settings that prevent loss:
Queue durability : durable=true – keeps queue metadata.
Message persistence : deliveryMode=2 – stores messages on disk.
Lazy Queue : x-queue-mode=lazy – writes directly to disk without loading into memory.
Confirm mechanism : publisher-confirm-type – producer receives acknowledgment of successful delivery.
Example code for RabbitMQ local storage + dead‑letter exchange protection:
channel.queueDeclare("order_queue", true, false, false, new HashMap<String, Object>(){{
put("x-dead-letter-exchange", "dlx_exchange"); // dead‑letter exchange
}});2.3 Replica Configuration
Replication settings for different MQs:
Kafka : acks=all and replica count ≥ 3.
RocketMQ : synchronous flush + master‑slave sync strategy.
Pulsar : BookKeeper multi‑replica storage.
When migrating a financial system to Kafka, we enabled the highest‑level configuration ( acks=all, min.insync.replicas=2, unclean.leader.election.enable=false) to guarantee safety, accepting a throughput reduction.
3. Solutions for Duplicate Consumption
3.1 Unique Business ID
Generate a global unique ID (e.g., Snowflake) and use Redis SETNX as a lock to guarantee idempotency:
// Snowflake generates a global unique ID
Snowflake snowflake = new Snowflake(datacenterId, machineId);
String bizId = "ORDER_" + snowflake.nextId();
// Deduplication logic (Redis atomic operation)
String key = "msg:" + bizId;
if (redis.setnx(key, "1")) {
redis.expire(key, 72 * 3600);
processMsg();
}During a promotion, Redis cluster jitter caused duplicate charges, which we later solved with a Bloom filter plus Redis double‑check.
3.2 Idempotent Design
Three patterns for different consistency requirements:
Strong consistency : SELECT FOR UPDATE to lock rows.
Eventual consistency : version number control (optimistic lock) with up to three retries.
Compensating transaction : design reverse operations (refund, inventory rollback) and persist operation logs.
Applying this three‑pronged approach reduced error rates in a user‑points system from 0.1% to 0.001%.
3.3 Dead‑Letter Queue
RabbitMQ manual ACK with retry limit and dead‑letter routing:
// Consumer sets manual ACK
channel.basicConsume(queue, false, deliverCallback, cancelCallback);
public void process(Message msg) {
try {
doBiz();
channel.basicAck(deliveryTag);
} catch (Exception e) {
if (retryCount < 3) {
channel.basicNack(deliveryTag, false, true); // retry
} else {
channel.basicNack(deliveryTag, false, false); // send to DLX
}
}
}This pattern rescued a social‑app push service by collecting all failed messages in a DLX and re‑processing them later.
4. System Architecture Design
4.1 Producer Side
For scenarios with low real‑time requirements, a local transaction table plus a scheduled compensation job works.
4.2 Consumer Side
Limit concurrent consumer threads to avoid message storms.
4.3 Ultimate Scheme
For high‑real‑time requirements, combine transactional messages with a local event table.
5. Ten Hard‑Earned Lessons
Always add a unique business ID to messages (don’t rely on MQ‑generated IDs).
Consumer logic must be idempotent (duplicate consumption is inevitable).
Database transaction and message sending are mutually exclusive (or use transactional messages).
Consumer thread count should not exceed partitions × 2 (Kafka lesson).
Dead‑letter queues need monitoring and alerts (otherwise support will chase you).
Test environments must simulate network jitter (chaos engineering).
Message payloads should carry version numbers (prevent silent incompatibility).
Never use a message queue as the primary business flow (it should be auxiliary).
Persist consumer offsets periodically (avoid message loss during rebalancing).
Business metrics must be linked with MQ monitoring (e.g., order volume vs. message volume).
Conclusion
Message queues are like the SWIFT network of financial systems: simple on the surface but full of hidden traps. Mastering configuration is not enough; you must design architectures that balance reliability and performance, using layered defenses to achieve dynamic consistency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
