Why Did My RocketMQ Consumer Stall? Uncovering the Docker Host‑Mode ClientId Bug
A deep dive into a massive RocketMQ message backlog reveals that identical clientIds generated in Docker host‑mode cause load‑balancing errors, and the article explains the root cause, code analysis, and a practical fix by customizing the clientId.
Preface
MQ users sometimes face message accumulation; the author recently hit this issue and discovered an unexpected cause.
Body
Late at night an alert "MQ message accumulation [TOPIC: XXX]" arrived, showing over 300 million piled‑up messages even though both producer and consumer servers reported normal traffic, disk I/O, and network conditions.
The author first suspected a RocketMQ bug because the open‑source version can handle billions of messages without performance loss.
Investigation of the consumer side showed that three consumer instances shared the same ClientId , which is unusual.
Producer speed >> Consumer speed Producer burst traffic. Consumer slowdown due to I/O block or crash.
Both producer and consumer applications appeared healthy, yet the backlog kept growing.
Problem Analysis
The identical ClientId was traced to Docker containers running in host network mode . In this mode, each container uses the host’s
docker0bridge with the default IP
172.17.0.1, so the client IP part of the ClientId is the same for all containers.
ClientId generation occurs in
ClientConfig.buildMQClientId():
public String buildMQClientId() {
StringBuilder sb = new StringBuilder();
sb.append(this.getClientIP());
sb.append("@");
sb.append(this.getInstanceName());
if (!UtilAll.isBlank(this.unitName)) {
sb.append("@");
sb.append(this.unitName);
}
return sb.toString();
}The IP is obtained via
RemotingUtil.getLocalAddress(), which returns the address of
docker0inside host‑mode containers.
Additionally, the instance name defaults to the process PID. Because all containers share the same PID namespace, the PID part is also identical, resulting in completely duplicated ClientIds.
Source Code Exploration
Key snippets:
private String clientIP = RemotingUtil.getLocalAddress(); public static String getLocalAddress() {
// iterate network interfaces, prefer non‑loopback, non‑private IPv4
// fallback to localhost
}The load‑balancing logic in the consumer client uses the list of ClientIds (
cidAll) to allocate message queues. When the list contains duplicate IDs, the index calculation yields the same result for each consumer, causing every consumer to be assigned the same queue.
int index = cidAll.indexOf(currentCID);
int mod = mqAll.size() % cidAll.size();
int averageSize = mqAll.size() <= cidAll.size() ? 1 :
(mod > 0 && index < mod) ? mqAll.size() / cidAll.size() + 1 : mqAll.size() / cidAll.size();
int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;
int range = Math.min(averageSize, mqAll.size() - startIndex);
for (int i = 0; i < range; i++) {
result.add(mqAll.get((startIndex + i) % mqAll.size()));
}Solution
Fix the load‑balancing error by ensuring each consumer has a unique ClientId. Set the environment variable
rocketmq.client.nameto a custom value, e.g., PID plus a timestamp:
@PostConstruct
public void init() {
System.setProperty("rocketmq.client.name",
String.valueOf(UtilAll.getPid()) + "@" + System.currentTimeMillis());
}Address the message backlog after the ClientId fix; consumption speed returns to normal.
Summary
RocketMQ generates a consumer ClientId from the client IP and process ID.
In Docker host‑mode, all containers share the same
docker0IP (172.17.0.1) and PID namespace, leading to identical ClientIds.
Consumer load‑balancing is performed on the client side; duplicate ClientIds cause every consumer to receive the same message queue, resulting in severe backlog.
Setting a unique
rocketmq.client.name(or otherwise customizing the ClientId) resolves the duplication and restores normal consumption.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.