How RocketMQ Becomes Service‑Mesh Ready: Challenges, Solutions, and Future Work
This article explains how RocketMQ’s network filter was merged into the CNCF Envoy project, details the message flow and stateful networking challenges within a service mesh, describes on‑demand CDS handling, and outlines future enhancements such as Pull support and broker‑side proxying.
Since the end of 2019, the Apache RocketMQ network filter underwent a four‑month code review and was merged into the CNCF Envoy official community, making RocketMQ the second middleware after Dubbo to join a Service Mesh ecosystem.
Message Flow in a Service Mesh
Pilot obtains the routing information for a Topic and distributes it to the data plane via xDS. Envoy proxies all SDK network requests to the Broker/Nameserver. The flow includes:
Pilot retrieves Topic routing and pushes it to Envoy.
When sending, Envoy identifies the request code, selects the appropriate CDS, uses subset load‑balancing to choose a writable Broker, and forwards the message.
When consuming, Envoy similarly identifies the request, selects a readable Broker, records metadata, and matches ACK requests to the correct Broker.
Challenges of Mesh‑ifying RocketMQ
1. Stateful Network Model
RocketMQ’s network model is stateful, relying on IP‑bound connections and consumer‑side load balancing. This creates two main problems:
The SDK cannot guarantee that send and consume requests target the same Broker because the IP/BrokerId information is stripped in the Mesh, breaking ordered‑message guarantees for partitioned topics.
Consumer load balancing is based on Queue ownership within a ConsumerGroup, which cannot be expressed by the current data‑plane load‑balancer, preventing fine‑grained Queue‑level routing.
The introduction of the Pop consumption interface, which allows multiple consumers to read the same Queue, enables the use of Envoy’s native load‑balancing strategies.
2. Massive Topic Routing Information
Nameservers store gigabytes of Topic routing data. In a Mesh this data is abstracted as CDS resources, forcing the control plane to push the entire set, which strains stability. Early Envoy versions performed full pushes; later delta xDS reduced traffic but still required full CDS in memory.
To mitigate this, an on‑demand CDS mechanism was added. Envoy now actively requests specific CDS resources via delta gRPC, allowing sidecars to fetch only the needed routing data. However, the original design mistakenly bound CDS names directly to Topic names, ignoring RDS and causing mismatches with community expectations.
route_config:
name: default_route
routes:
- match:
topic:
exact: mesh
headers:
- name: code
exact_match: 105
route:
cluster: foo-v145-acme-tau-beta-lambdaThe snippet shows a request for the topic "mesh" being routed to the CDS named "foo-v145-acme-tau-beta-lambda"; only the topic name is known beforehand.
What Mesh Brings to RocketMQ
Service Mesh provides transparent service discovery, load balancing, and traffic monitoring, reducing the responsibilities of both callers and providers. The current RocketMQ filter aggregates routing into TopicRouteData for SDK compatibility, but an ideal Mesh‑native SDK would be slimmer, eliminating rebalance, discovery, and even future features like message compression or schema validation.
Future Work
Support Pull requests: Envoy would translate Pull into Pop to keep user experience unchanged.
Support global ordered messages: Ensure ACK handling prevents disorder when a consumer goes offline.
Broker‑side proxy: Extend proxying and scheduling to the Broker side.
Community Journey
The initial PR for the RocketMQ filter was over 8,000 lines, reviewed by community members such as @天千, and required strict CI compliance (≥97% test coverage, Bazel static linking, extensive formatting). The rigorous process uncovered many issues and improved code quality, demonstrating the challenges of contributing high‑availability middleware to a large open‑source project.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
