Why Does Message Backlog Occur in Kafka/RocketMQ and How to Fix It

The article explains how message backlog arises when producers outpace consumers in systems like Kafka or RocketMQ, outlines primary causes such as unexpected production spikes, broker failures, and consumer bottlenecks, and provides step‑by‑step mitigation strategies including capacity scaling, temporary queues, and optimization techniques for producers, brokers, and consumers.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Why Does Message Backlog Occur in Kafka/RocketMQ and How to Fix It

How Message Backlog Happens

Message backlog (Message Backlog) in Kafka or RocketMQ occurs when the number of messages produced exceeds the number successfully consumed, leaving a large amount of unconsumed messages in the broker.

It is similar to a conveyor belt where upstream continuously adds items while downstream processing cannot keep up, causing accumulation.

1.1 Producer Exceeds Expectations

Production volume can grow many times beyond expectations due to:

Traffic spikes from events such as 618, Double 11, auctions, flash sales, etc.

Program defects such as infinite loops, batch requests, memory leaks causing traffic surges.

image
image

1.2 Receiving and Persistence Failures

Broker may encounter issues when receiving or persisting messages, such as service failures, network latency, or persistence errors.

1.3 Insufficient Broker Capacity

Insufficient CPU, memory, or I/O resources on broker machines limit message processing capability; upgrading or scaling the cluster can alleviate this.

1.4 Consumer Capability Decline

Common reasons include:

Consumption failures leading to massive retries.

Consumer program faults such as deadlocks or I/O blocking.

Consumer resource bottlenecks ; although modern MQ can handle tens of thousands of messages per second per node, capacity planning and scaling out brokers can resolve issues.

Solutions to Message Backlog

When backlog occurs, first identify the root cause, then apply temporary scaling to process accumulated messages.

Analyze the cause; recover any broker or consumer failures first.

Pause existing consumers.

Create a temporary queue with 10× the original number of partitions (new topic with increased partitions).

Develop a simple forwarding program to evenly distribute the backlogged messages into the temporary queue.

Scale consumers 10×, and also scale dependent services (cache, database, file services) accordingly.

After rapid consumption, restore the original consumption architecture to avoid resource waste.

Three Major Root Causes and Responses

3.1 Production‑Side Optimization

Apply rate limiting (token bucket, leaky bucket) during traffic spikes.

Use delayed queues for requests without strict real‑time requirements.

0
0

3.2 MQ‑Side Cluster Optimization

Increase the number of brokers and evenly distribute topics and partitions.

Upgrade hardware: network bandwidth, SSDs, memory, CPU.

image
image

3.3 Consumer‑Side Optimization

Reduce per‑message processing time by avoiding heavy computation and frequent DB access.

Use caching and asynchronous processing to improve consumption efficiency.

Adopt multithreaded consumption where applicable to increase throughput.

Conclusion

The article outlines the possible causes of message backlog and basic remediation steps. While most middleware is robust, improper business usage and scaling are common triggers, so continuous monitoring of service changes is essential.

PerformanceMessage QueueRocketMQBacklog
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.