Introducing MQCloud: A One‑Stop Service Platform for RocketMQ Monitoring, Management, and Operations
This article explains RocketMQ’s core features, the operational challenges of using its console, and how the MQCloud platform was designed to separate business and admin roles, provide comprehensive monitoring, automated deployment, security hardening, and a customized client, ultimately turning operational pain into a scalable, open‑source solution.
RocketMQ is a high‑availability, high‑performance distributed message queue that offers publish/subscribe, ordered, transactional, and delayed messages, and ensures reliability through synchronous disk flushing and dual‑write mechanisms.
Operating RocketMQ with its built‑in console quickly becomes cumbersome as business demands grow, leading to issues such as lack of user‑focused views, complex manual cluster management, missing monitoring and alerting, and difficulty handling serialization, tracing, flow control, and other best‑practice requirements.
MQCloud was created to address these pain points. It provides a unified platform that combines client SDKs, monitoring/alerting, and cluster operations, separating business users from administrators so each can focus on relevant data.
Key capabilities include:
Role‑based views: producers see topic configuration and send status; consumers see consumption lag and failures; admins get deployment, monitoring, and approval tools.
Clear operations: actions are presented explicitly per role.
Safety and compliance: all cluster changes go through an approval workflow.
Multi‑dimensional statistics and alerts: per‑minute topic production/consumption flow, broker‑level metrics, producer latency percentiles, and machine resource usage.
Statistics are collected from RocketMQ’s BrokerStatsManager . For example, the following code shows how RocketMQ records topic production numbers:
ConcurrentMap
statsItemTable; // statsKey<->StatsItem
public void addValue(final String statsKey, final int incValue, final int incTimes) {
StatsItem statsItem = this.getAndCreateStatsItem(statsKey);
statsItem.getValue().addAndGet(incValue);
statsItem.getTimes().addAndGet(incTimes);
}
public StatsItem getAndCreateStatsItem(final String statsKey) {
StatsItem statsItem = this.statsItemTable.get(statsKey);
if (null == statsItem) {
statsItem = new StatsItem(this.statsName, statsKey);
this.statsItemTable.put(statsKey, statsItem);
}
return statsItem;
}Each StatsItem maintains value, times, and snapshot lists, sampling data at seconds, minutes, and hours:
AtomicLong value; // e.g., message count or size
AtomicLong times; // occurrence count
LinkedList
csListMinute;
LinkedList
csListHour;
LinkedList
csListDay;
public void samplingInSeconds() {
synchronized (csListMinute) {
csListMinute.add(new CallSnapshot(System.currentTimeMillis(), times.get(), value.get()));
if (csListMinute.size() > 7) {
csListMinute.removeFirst();
}
}
}
public void samplingInMinutes() { /* ... */ }
public void samplingInHour() { /* ... */ }MQCloud aggregates these metrics every minute, stores them, and uses them for real‑time dashboards and alerting. Producer latency percentiles are calculated using a fixed‑size segmented array, allowing fast, memory‑bounded percentile queries.
Beyond statistics, MQCloud offers a customized client that adds multi‑cluster support, trace isolation, pluggable serialization (protobuf, JSON), flow control via token‑bucket and leaky‑bucket, Hystrix‑based isolation, and automatic monitoring hooks.
Automation is provided for broker deployment, machine lifecycle management, and resource collection (using nmon via SSH). Security is hardened by enabling ACL‑style admin checks and validating broker sync packets, as illustrated below:
if ((this.byteBufferRead.position() - this.processPostion) >= 8) {
int pos = this.byteBufferRead.position() - (this.byteBufferRead.position() % 8);
long readOffset = this.byteBufferRead.getLong(pos - 8);
this.processPostion = pos;
HAConnection.this.slaveAckOffset = readOffset;
if (HAConnection.this.slaveRequestOffset < 0) {
HAConnection.this.slaveRequestOffset = readOffset;
log.info("slave[" + HAConnection.this.clientAddr + "] request offset " + readOffset);
}
HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);
}MQCloud now runs on over 50 servers, manages more than 5 clusters, 700 topics, and processes over 400 million messages (≈400 GB) daily.
After maturing the product, the team open‑sourced the code, abstracting internal modules and releasing regular versions with new features, bug fixes, and documentation, continuing to evolve the platform for the community.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.