Operations 7 min read

Implementing Global Pulsar Client Monitoring with a SkyWalking Plugin

To give the business team a global, application‑level view of Pulsar performance, the team built a SkyWalking Java‑Agent plugin that automatically collects producer and consumer metrics from the Pulsar client, exposing latency, backlog and failure counts via Prometheus without modifying the client code.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Implementing Global Pulsar Client Monitoring with a SkyWalking Plugin

Background: The business team encountered various Pulsar issues such as message blocking, slow producer throughput, and other problems. Existing monitoring pages provide per‑topic metrics (rate, traffic, consumption status) but lack an application‑level view and important indicators like send/consume latency and failure counts.

The need was a global monitoring perspective that could quickly reveal the runtime situation of each application. After some development, a stable monitoring dashboard was launched, showing which topics each application uses and the production/consumption status of those topics.

Core Process

To expose the required metrics, they must be added to the application's metrics subsystem. The official Java client does not expose these metrics, whereas the Go client does.

Two implementation approaches were considered:

Modify ("magic‑modify") the Java client and manually instrument the needed metrics.

Develop a SkyWalking plugin that uses an agent to collect the data.

The first approach suffers from the need to maintain a custom code branch and coordinate upgrades across hundreds of services, making it cumbersome.

The second approach offers transparent upgrades by adding the plugin to a base Docker image and simplifies version unification of the Java client.

Client Principles

Understanding the Java client internals is essential. The org.apache.pulsar.client.api.ProducerStats and org.apache.pulsar.client.api.ConsumerStats interfaces provide most producer and consumer metrics. However, a bug in messageListener caused the consumer queue size to always report zero, which was fixed in a later Pulsar version (see GitHub issue #20076 and PR #20245).

org.apache.pulsar.client.api.ProducerStats
org.apache.pulsar.client.api.ConsumerStats

Developing the SkyWalking Plugin

The plugin is built using the SkyWalking Java‑Agent, which offers convenient SDK wrappers for native agent interfaces. The plugin leverages the Prometheus simpleclient library to generate metrics.

<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient</artifactId>
  <version>0.12.0</version>
  <scope>provided</scope>
</dependency>

Key steps in the plugin workflow:

Maintain a consumerPool when consumers are created or destroyed.

Start a scheduled task that periodically pulls metric data from each consumer.

When consuming multi‑partition topics, a unique hashcode label is added to each consumer to distinguish them.

The plugin checks the classpath for the Prometheus dependency at initialization to avoid runtime errors in applications that lack the library.

Summary

With the new monitoring panel, internal Pulsar client behavior is no longer a black box. Alerts can be configured for consumption backlog, high send latency, etc. Additional features were later added to query the full lifecycle of a message via messageId (producer/consumer info, production time, push time, ack time) and to list topic messages using Pulsar‑SQL.

JavamonitoringPluginmetricsPrometheusPulsarSkyWalking
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.