Operations 18 min read

How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

This article explains how to design and implement a comprehensive Prometheus‑based monitoring and alerting solution for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, exporter deployment, alert rule design, and practical examples with code snippets.

Efficient Ops
Efficient Ops
Efficient Ops
How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

Design Idea

The monitoring system must scrape exposed metrics, analyze them, and generate alerts. Key questions include: what to monitor, how metrics are exposed, how Prometheus scrapes them, and how to configure dynamic alert rules.

Monitoring Objects

All big‑data components run as pods in a Kubernetes cluster.

Metric Exposure Methods

Components support three exposure types:

Directly expose Prometheus metrics (pull).

Push metrics to

prometheus‑pushgateway

(push).

Use a custom exporter to convert metrics to Prometheus format (pull).

Some components, like Flink on YARN, require the push method because they run inside YARN containers.

Metric Scraping

Prometheus pulls targets using configurations such as native Job,

PodMonitor

, and

ServiceMonitor

(via Prometheus Operator).

Reference: https://cloud.tencent.com/document/product/1416/55995

When running on Kubernetes,

PodMonitor

is usually preferred for its simplicity.

Alert Design

Alert Flow

Service experiences an anomaly.

Prometheus fires an alert.

Alertmanager receives the alert.

Alertmanager processes it according to configured rules (grouping, inhibition, notification).

Timing of alert triggering is critical.

Dynamic Alert Configuration

Alerting consists of two parts:

alertmanager

(handling policies) and

alertRule

(specific rules).

Custom Alert Platform Integration

Beyond webhook handling, production environments should forward alerts to a custom platform via webhook for advanced processing, deduplication, and multi‑channel notifications.

Alert Hierarchy Labels

Granularity of monitoring objects determines alert grouping; labels guide Alertmanager routing.

Technical Implementation

Implementation is divided into:

Kubernetes deployment of Prometheus (kube‑prometheus).

Enhanced configuration using

kubernetes_sd_config

+ relabel.

Implementation of a big‑data exporter.

Alert design examples.

1. Deploying Prometheus on Kubernetes

Both

kube‑prometheus

and

prometheus‑operator

can create and manage Prometheus; this guide uses

kube‑prometheus

which builds on the operator and provides default manifests.

Note: Ensure the Kubernetes version matches the kube‑prometheus version (e.g., k8s 1.14 → kube‑prometheus 0.3, operator 0.32).

Configuration files are written in Jsonnet and compiled with

jp

in a Go environment.

2. kubernetes_sd_config + relabel Solution

Uses native job configuration with

additional‑scrape‑config

to enable automatic pod discovery and label rewriting.

3. bigdata‑exporter Implementation

The exporter collects metrics from components such as HDFS, YARN, HBase, etc., converts them to Prometheus format, and exposes them.

<code>annotations:
  prometheus.io/scrape: "true"
  prometheus.io/scheme: "http"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "19091"
</code>

Exporter discovery relies on pod labels and annotations, with a

role

label indicating the parsing rule.

<code>labels:
  bigData.metrics.object: pod
annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"
</code>

4. Alert Design Examples

Example

alertmanager

configuration with two receivers (default and a custom webhook):

<code>global:
  resolve_timeout: 5m
receivers:
- name: 'default'
- name: 'test.web.hook'
  webhook_configs:
  - url: 'http://alert-url'
route:
  receiver: 'default'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h
  group_by: [groupId,instanceId]
  routes:
  - receiver: 'test.web.hook'
    continue: true
    match:
      groupId: node-disk-usage
  - receiver: 'test.web.hook'
    continue: true
    match:
      groupId: kafka-topic-highstore
</code>

Disk usage alert rule example:

<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
  - name: node-disk-usage
    rules:
    - alert: node-disk-usage
      expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: node-disk-usage
        userIds: super
        receivers: SMS
      annotations:
        title: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"
        content: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"
</code>

Kafka lag alert rule example (group and instance granularity):

<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
  - name: kafka-topic-highstore
    rules:
    - alert: kafka-topic-highstore-${uniqueName}
      expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: kafka-topic-highstore
        instanceId: ${uniqueName}
        userIds: super
        receivers: SMS
      annotations:
        title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
        content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
</code>

Alert Flow Example

Two nodes (node1, node2) with disk‑usage alerts illustrate grouping, repeat intervals, and recovery handling.

<code>node1 waits for 1m → enters group → group_wait 30s → first alert
node2 joins group → group_interval 5m → second alert
repeat_interval 10m → subsequent alerts
after recovery → repeat_interval used for resolved alert
</code>

Exporter Placement

Exporters can run as sidecars (1:1) or as independent deployments (1:many). Sidecars bind to the component lifecycle; independent exporters reduce coupling and are suitable for multi‑node services like Kafka.

Promtool Usage

<code># Enter Prometheus pod
kubectl -n monitoring exec -it prometheus-k8s-0 sh
# Check metric format
curl -s http://ip:9999/metrics | promtool check metrics
</code>

Port‑Forward for External Access

<code># Prometheus
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &
</code>
Monitoringcloud nativeBig DataoperationskubernetesalertingPrometheus
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.