Operations 18 min read

How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

This article explains how to design and implement a Prometheus‑based monitoring solution for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, alerting architecture, exporter development, and practical code examples for a production‑ready setup.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

Design Overview

The monitoring system must reliably collect, analyze, and alert on metric data exposed by big‑data components running in a Kubernetes cluster. Key questions include identifying the monitoring targets, how those targets expose metrics, how Prometheus scrapes them, and how alert rules are dynamically configured.

What is being monitored?

How are metrics exposed?

How does Prometheus scrape the metrics?

How are alert rules managed dynamically?

Monitoring Objects

All big‑data services are deployed as Pods in the Kubernetes cluster, making each Pod a monitoring target.

Metric Exposure Methods

Directly expose Prometheus‑compatible metrics (pull).

Push metrics to a prometheus‑pushgateway (push).

Use a custom exporter to translate other formats into Prometheus exposition format (pull).

Components such as Flink may support multiple methods; most components provide an official or third‑party exporter, and the direct method is usually sufficient.

Scrape Configuration

Prometheus ultimately pulls metrics from the chosen targets. In a Kubernetes environment the common scrape configurations are:

Native Job configuration. PodMonitor (via Prometheus Operator) for pod‑level metrics. ServiceMonitor (via Prometheus Operator) for service‑level metrics.

When the platform already runs on Kubernetes, PodMonitor is recommended for its simplicity.

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/scheme: "http"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "19091"

Alert Design

The alert flow consists of four steps: service anomaly → Prometheus generates an alert → Alertmanager receives the alert → Alertmanager applies routing, grouping, silencing, and notification actions (e.g., SMS, email, webhook).

Dynamic configuration of alertmanager policies and alertRule definitions.

Integration with a custom alert platform via webhook for business‑specific handling.

Use of hierarchical label groups (e.g., groupId, instanceId) to drive multi‑level routing.

Technical Implementation

Deploy Prometheus in Kubernetes using kube‑prometheus (which builds on prometheus‑operator).

Enhance the default configuration with kubernetes_sd_config plus relabel rules for automatic pod discovery.

Develop a bigdata‑exporter that gathers JMX metrics from HDFS, YARN, HBase, etc., and exposes them in Prometheus format.

Provide concrete alert rule examples for node‑disk usage and Kafka lag.

Kube‑Prometheus vs Prometheus‑Operator

Both projects can create and manage Prometheus instances. kube‑prometheus bundles a large set of default manifests and relies on prometheus‑operator. Choose the version that matches your Kubernetes version (e.g., K8s 1.14 → kube‑prometheus 0.3 with prometheus‑operator 0.32).

Kubernetes SD Config + Relabel

The kubernetes_sd_config enables Prometheus to discover pods via the Kubernetes API. Relabel rules rewrite discovered labels, allowing fine‑grained filtering and dynamic scrape target generation.

BigData Exporter

The exporter runs either as a sidecar in the same pod as the component (1:1) or as an independent deployment (1:1 or 1:many). It uses pod labels and annotations to discover targets and a role field to select the appropriate parsing logic.

labels:
  bigData.metrics.object: pod
annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"

Alert Rule Examples

Node‑disk‑usage rule (group‑level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
  - name: node-disk-usage
    rules:
    - alert: node-disk-usage
      expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: node-disk-usage
        userIds: super
        receivers: SMS
      annotations:
        title: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"
        content: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"

Kafka‑lag rule (instance‑level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
  - name: kafka-topic-highstore
    rules:
    - alert: kafka-topic-highstore-${uniqueName}
      expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: kafka-topic-highstore
        instanceId: ${uniqueName}
        userIds: super
        receivers: SMS
      annotations:
        title: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"
        content: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"

Alert Flow Example

Two nodes (node1, node2) are monitored for disk usage. The timeline demonstrates how for, group_wait, group_interval, and repeat_interval affect alert firing and grouping.

node1 waits for 1m → enters group
group_wait 30s → first alert (node1)
node2 waits for 1m → joins group
group_interval 5m → second alert (node1,node2)
repeat_interval 10m → subsequent alerts while group unchanged

Key Timing Parameters

for

: duration a condition must persist before the rule fires. group_wait: initial wait after a new group is created. group_interval: interval between alerts when the group’s content changes. repeat_interval: interval between alerts when the group’s content stays the same.

Exporter Placement

Sidecar deployment ties the exporter’s lifecycle to the monitored component (ideal for single‑instance services). Independent deployment decouples the exporter, allowing one exporter to monitor many instances (useful for multi‑node services like Kafka).

Promtool Usage

# Enter the Prometheus pod
kubectl -n=monitoring exec -it prometheus-k8s-0 sh
# Show help
promtool -h
# Validate metric format
curl -s http://ip:9999/metrics | promtool check metrics

Port‑Forward for External Access

# Prometheus UI
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana UI
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager UI
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &

Reference: https://cloud.tencent.com/document/product/1416/55995

GitHub repository for the enhanced kube‑prometheus configuration: https://github.com/linshenkx/kube-prometheus-enhance

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeBig DataKubernetesAlertingPrometheusExporter
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.