Cloud Native 14 min read

How RocketMQ Harnesses Prometheus for Full‑Stack Observability

This article explains how RocketMQ integrates with Prometheus and Grafana to provide comprehensive metrics, tracing, and logging, detailing the exporter architecture, deployment choices, span topology, dashboard examples, and ARMS‑based alerting for cloud‑native message‑queue observability.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How RocketMQ Harnesses Prometheus for Full‑Stack Observability

RocketMQ, Alibaba's high‑performance messaging platform, is presented as a flagship cloud‑native product with a mature observability solution built on Prometheus. The article outlines the three pillars of observability—Metrics, Tracing, and Logging—and shows how RocketMQ implements each.

Metrics

RocketMQ ships a ready‑to‑use Prometheus exporter and Grafana dashboards that expose message volume, backlog, latency at each processing stage, and other key indicators. The exporter periodically pulls data from the MQ cluster via MQAdminExt, normalises it, and exposes it on an HTTP endpoint for Prometheus to scrape.

Tracing

OpenTelemetry tracing is integrated on both client and server sides. Clients embed an OpenTelemetry exporter that batches spans to a proxy (C‑Broker). The proxy acts as a collector, merging client‑side and its own spans. Users can configure custom collectors, use the commercial hosted store, or run an open‑source backend. A redesigned span topology models the message lifecycle (Prod, Recv, Await, Proc, ACK/Nack) and conforms to the OpenTelemetry specification.

Logging

Standardised client‑side logging simplifies issue localisation by providing consistent log formats across producers, brokers, and consumers.

Exporter Deployment Choices

Two deployment modes are discussed: embedding the Prometheus client directly in the application (low overhead, no extra components) versus running a separate exporter process (decoupled, easier for third‑party code). The recommendation is to embed the client when you control the code, otherwise use the exporter.

High‑Cardinality Mitigation

Because RocketMQ metrics can include many labels (tenant, instance, topic, consumer group, etc.), the article advises limiting label explosion to avoid excessive series count, storage cost, and query performance degradation. Specific optimisations were applied to the native Prometheus client to control memory usage.

Multi‑Tenant Monitoring

In production, each tenant’s RocketMQ resources are isolated. Deploying a dedicated exporter per tenant would be impractical, so RocketMQ adopts a shared exporter approach that tags metrics with tenant identifiers, enabling per‑tenant monitoring without proliferating exporter instances.

Full‑Link Tracing

The tracing flow consists of:

Client‑side OpenTelemetry exporter sending spans to the proxy.

Proxy acting as a collector for both client and its own spans.

Optional storage back‑ends (custom collector, commercial hosted store, or self‑hosted).

Span topology aligned with the message lifecycle.

Accurate Metrics

Server‑side aggregation of tracing data produces OpenMetrics‑compatible metrics that integrate seamlessly with Prometheus and Grafana.

Grafana Dashboards

The provided dashboards cover overview, topic‑level send rates, consumer group performance, and more, offering richer and more precise data than the open‑source equivalents, continuously refined by the RocketMQ team.

ARMS Integration

RocketMQ’s tracing data is stored in Alibaba Cloud Log Service, then transformed into Prometheus‑compatible metrics via an ETL pipeline. ARMS creates a dedicated Prometheus instance per cloud user, delivering isolated storage, multi‑tenant dashboards, and alarm capabilities. The ARMS console integrates Grafana views and alarm rules, allowing one‑click activation of monitoring for any RocketMQ instance.

Message Backlog Diagnosis

The article explains how to interpret backlog metrics such as Ready messages and Queue time, identify root causes (consumer failures or upstream overload), and set appropriate alerts on send health, consumption latency, and related logs or traces.

Alerting and Incident Response

ARMS provides end‑to‑end alert configuration, scheduling, and handling workflows, plus intelligent noise reduction and multi‑channel notifications (e.g., DingTalk). Alerts can be linked to trace IDs and logs for rapid root‑cause analysis.

Overall, the integration demonstrates how RocketMQ leverages Prometheus, OpenTelemetry, and Alibaba Cloud ARMS to deliver a comprehensive, cloud‑native observability stack for messaging workloads.

cloud-nativeobservabilityMetricsPrometheusRocketMQTracingARMS
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.