Cloud Native 13 min read

How Dubbo 3 Implements Cloud‑Native Observability with Metrics, Tracing, and Prometheus

This article explains the fundamentals and progress of Dubbo 3’s observability features, focusing on the Metrics module, its data model, collection, local aggregation using sliding windows and TDigest, and Prometheus‑based metric export, illustrating code snippets and architectural details for cloud‑native microservice monitoring.

Alibaba Cloud Native

Sep 17, 2022

How Dubbo 3 Implements Cloud‑Native Observability with Metrics, Tracing, and Prometheus

Background

Observability entered the IT field in 2018, gradually replacing traditional monitoring that only reports overall system availability. As cloud‑native technologies evolve and enterprises move from monoliths to distributed microservices, fine‑grained analysis and correlation become essential, requiring a developer‑centric, proactive, and high‑resolution approach.

Dubbo 3 Observability Roadmap

Dubbo 3’s cloud‑migration plan treats observability as a mandatory capability. Scenarios such as inter‑cluster load balancing, Kubernetes auto‑scaling, and instance health modeling all depend on robust observability. The current effort focuses on building the Metrics module.

APM Overview

Application Performance Management (APM) manages and monitors software performance and availability, serving as a key service‑governance tool. An APM system typically consists of three subsystems: Metrics, Tracing, and Logging.

Metrics Structure and Types

A metric consists of four parts: name, labels/tags (dimensions for filtering or aggregation), timestamp, and value. Additionally, each metric has a type that determines its monitoring scenario and visualization.

Gauge : Represents a value that can increase or decrease (e.g., CPU load, active threads, memory usage). It records instantaneous values.

Counter : Monotonically increasing values such as total request count. Derived values like request rate (QPS) are obtained by differencing or derivative.

Summary : Provides aggregated statistics such as average and quantiles, commonly used for response latency.

Histogram : Buckets values into ranges to produce a bar‑style distribution, also useful for latency analysis.

Metrics Collection

The goal of collection is to snapshot microservice runtime state and supply raw data for further analysis. In Dubbo, collection points are inserted via an SPI‑based Filter on the provider side. The following code snippet shows part of the collection logic (simplified for illustration):

// Example snippet of Dubbo metrics filter registration
public class MetricsFilter implements Filter {
    @Override
    public Result invoke(Invoker<?> invoker, Invocation inv) throws RpcException {
        // collect interfaceName, methodName, group, version as key
        // update corresponding MetricEntry
        return invoker.invoke(inv);
    }
}

Collected data are stored in a ConcurrentHashMap with a segmented‑lock design to ensure concurrency. Each entry’s key combines the four dimensions (interface, method, group, version) and becomes a set of labels when exported.

A MetricsListener list implements a producer‑consumer pattern: the default collector notifies listeners when new values are available, enabling additional collectors to perform local aggregation.

Local Aggregation – Sliding Window and TDigest

Dubbo performs local aggregation using a sliding‑window mechanism and the TDigest algorithm for quantile calculations. The sliding window consists of a configurable number of buckets (e.g., 6) each covering a fixed time span (e.g., 2 minutes). Incoming metric samples are written to all buckets; every interval the window advances, discarding the oldest bucket. Reads access the current bucket, providing a recent‑time aggregation.

For percentile metrics such as p99 or p95, Dubbo employs TDigest, an approximate quantile algorithm based on sketching. TDigest clusters data points into centroids (average value + count). The PDF of the original distribution is approximated by these centroids, allowing fast percentile lookup with higher precision at distribution tails.

Metric Export – Prometheus

Exporting metrics enables external storage, computation, and visualization. Dubbo currently supports Prometheus, a CNCF open‑source monitoring system composed of data collection (pull or push), a time‑series database, and a query language.

Export is activated only when the <dubbo:metrics> configuration includes a protocol attribute. Users can choose pull mode (Dubbo exposes an HTTP endpoint) or push mode (via Prometheus Pushgateway).

<dubbo:metrics protocol="prometheus" mode="push" address="${prometheus.pushgateway-url}" interval="5" />

The interval attribute defines the push frequency in seconds.

Future Release

Dubbo 3.1.2 / 3.1.3 are expected to include the full Metrics functionality, making the observability stack production‑ready.

Service Governance and Commercialization

Observability is a core component of Dubbo 3’s cloud‑native migration and is integrated with the OpenSergo standard for unified service‑governance. The community collaborates with projects such as Bilibili and CloudWeGo to co‑define governance specifications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native tdigest

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.