Operations 8 min read

Investigation and Resolution of Partial Queue Consumption after RocketMQ Topic Expansion

This article details a real‑world RocketMQ case where expanding a topic's queue count caused two consumer groups to miss messages on one broker, explains the root cause of missing subscription metadata after cluster scaling, and outlines the manual steps taken to restore full consumption.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Investigation and Resolution of Partial Queue Consumption after RocketMQ Topic Expansion

The message team received a report that after expanding a RocketMQ topic, some queues were not being consumed, leading to message backlog and impacting online services.

To avoid exposing production data, the issue was reproduced in a virtual machine environment.

Cluster status : The cluster consists of two brokers (broker-a and broker-b) with the following configuration:

brokerClusterName = DefaultCluster
brokerName = broker-a
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH
brokerIP1=192.168.0.220
brokerIP2-192.168.0.220
namesrvAddr=192.168.0.221:9876;192.168.0.220:9876
storePathRootDir=/opt/application/rocketmq-all-4.5.2-bin-release/store
storePathCommitLog=/opt/application/rocketmq-all-4.5.2-bin-release/store/commitlog
autoCreateTopicEnable=false
autoCreateSubscriptionGroup=false

Because automatic topic and subscription group creation are disabled, any new resources must be provisioned manually.

Online queue expansion : Using the internal operations platform, the team executed the RocketMQ updateTopic command to increase the queue count from 4 to 8 on all brokers in the DefaultCluster:

sh ./mqadmin upateTopic -n 192.168.0.220:9876 -c DefaultCluster -t topic_dw_test_by_order_01 -r 8 -w 8

The command succeeded, and the console confirmed that each broker now hosts 8 queues for the topic.

Message sending after expansion : Subsequent traffic showed that all 16 queues (8 per broker) were actively receiving messages, confirming that the online expansion did not require a restart of producers or consumers.

Problem emergence : Two of the five consumer groups subscribed to the topic reported that a subset of queues were not being consumed, causing downstream systems to miss processing.

Analysis : Inspection of the consumer status revealed that only one consumer process on broker‑a was handling its 8 queues, while the corresponding queues on broker‑b had no active consumers. Further investigation showed that the problematic consumer groups lacked subscription entries on broker‑b because the cluster expansion did not copy the existing topic.json and subscriptionGroup.json files to the new broker.

Resolution : The operations engineer manually created the missing subscription groups on broker‑b via the RocketMQ console. After the subscription entries were added, the previously idle consumer processes began pulling messages, and the backlog cleared.

Root cause and lessons : The expansion added a new broker but failed to synchronize topic and subscription metadata, and with autoCreateSubscriptionGroup set to false, the new broker could not serve the queues. The fix is to ensure that topic.json and subscriptionGroup.json are replicated across all brokers during scaling, or enable automatic creation if appropriate.

operationsMessage QueuerocketmqCluster ScalingConsumer LagTopic Expansion
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.