Operations 12 min read

How to Ensure Data Consistency in Message Queues: 10 Hard‑Earned Lessons

This article explores why message queues can lose consistency, presents concrete solutions such as transactional two‑phase commits, persistence settings, replica configurations, unique IDs, idempotent designs, and dead‑letter queues, and shares ten practical lessons drawn from real‑world incidents.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Ensure Data Consistency in Message Queues: 10 Hard‑Earned Lessons

Preface

Last month an e‑commerce system encountered a spooky bug: the user paid successfully but the order status never changed to "shipped". The root cause was that the order service's MQ message vanished, highlighting data‑consistency design in message queues as one of the three biggest nightmares of distributed systems.

1. Causes of Data Inconsistency

Four fatal reasons observed in Kafka, RabbitMQ and RocketMQ:

Producer tragedy : message reaches the broker but the power fails before it is flushed to disk.

Consumer tragedy : message is consumed successfully but the business logic fails.

Roulette : network jitter causes duplicate deliveries.

Data islands : database and message state become detached (e.g., order placed but coupon not issued).

These situations all lead to inconsistent data in MQ.

2. Solutions to Prevent Message Loss

2.1 Transactional Two‑Phase Commit

Using RocketMQ transactional messages, the workflow resembles a pre‑sale deposit pseudo‑code:

// Send transactional message core code
TransactionMQProducer producer = new TransactionMQProducer("group");
producer.setTransactionListener(new TransactionListener() {
    // Execute local transaction (e.g., deduct inventory)
    public LocalTransactionState executeLocalTransaction(Message msg, Object arg) {
        return doBiz() ? LocalTransactionState.COMMIT : LocalTransactionState.ROLLBACK;
    }
    // Broker checks local transaction status
    public LocalTransactionState checkLocalTransaction(MessageExt msg) {
        return checkDB(msg.getTransactionId()) ? COMMIT : ROLLBACK;
    }
});

In real scenarios, remember to implement a compromise query (e.g., a transaction log table) inside checkLocalTransaction. A past incident showed that a network glitch prevented the broker from receiving the commit, causing endless retries.

2.2 Persistence Configuration

Key RabbitMQ settings that prevent loss:

Queue durability : durable=true – keeps queue metadata.

Message persistence : deliveryMode=2 – stores messages on disk.

Lazy Queue : x-queue-mode=lazy – writes directly to disk without loading into memory.

Confirm mechanism : publisher-confirm-type – producer receives acknowledgment of successful delivery.

Example code for RabbitMQ local storage + dead‑letter exchange protection:

channel.queueDeclare("order_queue", true, false, false, new HashMap<String, Object>(){{
    put("x-dead-letter-exchange", "dlx_exchange"); // dead‑letter exchange
}});

2.3 Replica Configuration

Replication settings for different MQs:

Kafka : acks=all and replica count ≥ 3.

RocketMQ : synchronous flush + master‑slave sync strategy.

Pulsar : BookKeeper multi‑replica storage.

When migrating a financial system to Kafka, we enabled the highest‑level configuration ( acks=all, min.insync.replicas=2, unclean.leader.election.enable=false) to guarantee safety, accepting a throughput reduction.

3. Solutions for Duplicate Consumption

3.1 Unique Business ID

Generate a global unique ID (e.g., Snowflake) and use Redis SETNX as a lock to guarantee idempotency:

// Snowflake generates a global unique ID
Snowflake snowflake = new Snowflake(datacenterId, machineId);
String bizId = "ORDER_" + snowflake.nextId();

// Deduplication logic (Redis atomic operation)
String key = "msg:" + bizId;
if (redis.setnx(key, "1")) {
    redis.expire(key, 72 * 3600);
    processMsg();
}

During a promotion, Redis cluster jitter caused duplicate charges, which we later solved with a Bloom filter plus Redis double‑check.

3.2 Idempotent Design

Three patterns for different consistency requirements:

Strong consistency : SELECT FOR UPDATE to lock rows.

Eventual consistency : version number control (optimistic lock) with up to three retries.

Compensating transaction : design reverse operations (refund, inventory rollback) and persist operation logs.

Applying this three‑pronged approach reduced error rates in a user‑points system from 0.1% to 0.001%.

3.3 Dead‑Letter Queue

RabbitMQ manual ACK with retry limit and dead‑letter routing:

// Consumer sets manual ACK
channel.basicConsume(queue, false, deliverCallback, cancelCallback);

public void process(Message msg) {
    try {
        doBiz();
        channel.basicAck(deliveryTag);
    } catch (Exception e) {
        if (retryCount < 3) {
            channel.basicNack(deliveryTag, false, true); // retry
        } else {
            channel.basicNack(deliveryTag, false, false); // send to DLX
        }
    }
}

This pattern rescued a social‑app push service by collecting all failed messages in a DLX and re‑processing them later.

4. System Architecture Design

4.1 Producer Side

For scenarios with low real‑time requirements, a local transaction table plus a scheduled compensation job works.

Producer architecture diagram
Producer architecture diagram

4.2 Consumer Side

Limit concurrent consumer threads to avoid message storms.

Consumer architecture diagram
Consumer architecture diagram

4.3 Ultimate Scheme

For high‑real‑time requirements, combine transactional messages with a local event table.

Ultimate architecture diagram
Ultimate architecture diagram

5. Ten Hard‑Earned Lessons

Always add a unique business ID to messages (don’t rely on MQ‑generated IDs).

Consumer logic must be idempotent (duplicate consumption is inevitable).

Database transaction and message sending are mutually exclusive (or use transactional messages).

Consumer thread count should not exceed partitions × 2 (Kafka lesson).

Dead‑letter queues need monitoring and alerts (otherwise support will chase you).

Test environments must simulate network jitter (chaos engineering).

Message payloads should carry version numbers (prevent silent incompatibility).

Never use a message queue as the primary business flow (it should be auxiliary).

Persist consumer offsets periodically (avoid message loss during rebalancing).

Business metrics must be linked with MQ monitoring (e.g., order volume vs. message volume).

Conclusion

Message queues are like the SWIFT network of financial systems: simple on the surface but full of hidden traps. Mastering configuration is not enough; you must design architectures that balance reliability and performance, using layered defenses to achieve dynamic consistency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaData ConsistencyRabbitMQTransactional Messaging
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.