Cloud Native 14 min read

RocketMQ’s Cloud‑Native Operator: 30% Faster Filtering and POP Consumption

This article details how Alibaba Cloud transformed RocketMQ with a Kubernetes‑based operator, optimized message filtering by indexing MessageType for up to 30% CPU reduction, and introduced a POP consumption model that eliminates rebalance delays, achieving stable performance during the 2020 Double‑11 peak.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
RocketMQ’s Cloud‑Native Operator: 30% Faster Filtering and POP Consumption

Background

RocketMQ has supported Alibaba Group’s Double‑11 shopping festivals for seven consecutive years with zero failures, handling peak transaction rates of 583,000 messages per second in 2020. However, the existing deployment relied on a custom middleware platform that required manual steps, suffered from scaling pain, and lacked true cloud‑native automation.

Cloud‑Native Transformation

The team built a Kubernetes‑based operator to manage RocketMQ clusters. By defining a custom CRD that abstracts the broker model, the operator handles pod creation, configuration, scaling, migration, and metadata synchronization, removing the need for manual IaaS‑level operations. This shift also eliminated the traditional master‑slave deployment pattern, allowing all brokers to run as identical stateless pods that can self‑heal.

Performance Optimization of Message Filtering

During large‑scale promotions, the transaction message filtering logic became a major CPU cost because thousands of subscription expressions (mostly MessageType == xxx) were evaluated using Aviator scripts, which ultimately called String.compareTo(). To accelerate this, the team indexed the MessageType field:

Extract MessageType from each Aviator expression by hooking into the recursive‑descent parser.

Store the extracted expressions in a HashMap<MessageType, List<Expression>> so that a single hash lookup filters out the majority of non‑matching rules.

Two cases were handled:

If messageType == '200-trade-paid-done', the expression reduces to the remaining conditions (e.g., buyerId==123456).

If messageType != '200-trade-paid-done', the expression short‑circuits to false.

Complex logical combinations (e.g., multiple OR branches) were also supported by preserving the “not‑equal” path.

New POP Consumption Model

The traditional Pull model suffered from consumer hang‑ups: a stalled client retained its queue assignment, causing message backlog. POP consumption replaces rebalance with a request‑based approach where each client issues POP requests to all brokers, and brokers distribute messages based on an internal algorithm. If a client hangs, other clients continue to consume its pending messages.

POP workflow:

Broker locks the target queue and reads messages from the store.

Writes a CK (checkpoint) message recording the POP position.

Commits the offset and releases the lock.

CK messages enable retries: if a client does not acknowledge within a timeout, the broker re‑processes the CK entry and moves the message to a retry queue.

Results

After deploying the operator and the POP model, the Double‑11 promotion showed stable send‑RT metrics. The MessageType indexing reduced CPU usage by up to 32% for complex subscription expressions, significantly lowering the cost of the transaction clusters. POP consumption eliminated rebalance‑induced latency and prevented message pile‑up caused by hung consumers.

Conclusion

The cloud‑native operator and POP consumption together modernized RocketMQ’s architecture, achieving zero‑failure operation, improved performance, and simplified operations on Kubernetes.

cloud-nativePerformance optimizationKubernetesRocketMQMessage Filteringpop consumption
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.