Operations 18 min read

How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

This article explains how to design and implement a comprehensive Prometheus‑based monitoring and alerting solution for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, exporter deployment, alert rule design, and practical examples with code snippets.

Efficient Ops

Mar 17, 2024

How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes

Design Idea

The monitoring system must scrape exposed metrics, analyze them, and generate alerts. Key questions include: what to monitor, how metrics are exposed, how Prometheus scrapes them, and how to configure dynamic alert rules.

Monitoring Objects

All big‑data components run as pods in a Kubernetes cluster.

Metric Exposure Methods

Components support three exposure types:

Directly expose Prometheus metrics (pull).

Push metrics to prometheus‑pushgateway (push).

Use a custom exporter to convert metrics to Prometheus format (pull).

Some components, like Flink on YARN, require the push method because they run inside YARN containers.

Metric Scraping

Prometheus pulls targets using configurations such as native Job, PodMonitor, and ServiceMonitor (via Prometheus Operator).

Reference: https://cloud.tencent.com/document/product/1416/55995

When running on Kubernetes, PodMonitor is usually preferred for its simplicity.

Alert Design

Alert Flow

Service experiences an anomaly.

Prometheus fires an alert.

Alertmanager receives the alert.

Alertmanager processes it according to configured rules (grouping, inhibition, notification).

Timing of alert triggering is critical.

Dynamic Alert Configuration

Alerting consists of two parts: alertmanager (handling policies) and alertRule (specific rules).

Custom Alert Platform Integration

Beyond webhook handling, production environments should forward alerts to a custom platform via webhook for advanced processing, deduplication, and multi‑channel notifications.

Alert Hierarchy Labels

Granularity of monitoring objects determines alert grouping; labels guide Alertmanager routing.

Technical Implementation

Implementation is divided into:

Kubernetes deployment of Prometheus (kube‑prometheus).

Enhanced configuration using kubernetes_sd_config + relabel.

Implementation of a big‑data exporter.

Alert design examples.

1. Deploying Prometheus on Kubernetes

Both kube‑prometheus and prometheus‑operator can create and manage Prometheus; this guide uses kube‑prometheus which builds on the operator and provides default manifests.

Note: Ensure the Kubernetes version matches the kube‑prometheus version (e.g., k8s 1.14 → kube‑prometheus 0.3, operator 0.32).

Configuration files are written in Jsonnet and compiled with jp in a Go environment.

2. kubernetes_sd_config + relabel Solution

Uses native job configuration with additional‑scrape‑config to enable automatic pod discovery and label rewriting.

3. bigdata‑exporter Implementation

The exporter collects metrics from components such as HDFS, YARN, HBase, etc., converts them to Prometheus format, and exposes them.

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/scheme: "http"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "19091"

Exporter discovery relies on pod labels and annotations, with a role label indicating the parsing rule.

labels:
  bigData.metrics.object: pod
annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"

4. Alert Design Examples

Example alertmanager configuration with two receivers (default and a custom webhook):

global:
  resolve_timeout: 5m
receivers:
- name: 'default'
- name: 'test.web.hook'
  webhook_configs:
  - url: 'http://alert-url'
route:
  receiver: 'default'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h
  group_by: [groupId,instanceId]
  routes:
  - receiver: 'test.web.hook'
    continue: true
    match:
      groupId: node-disk-usage
  - receiver: 'test.web.hook'
    continue: true
    match:
      groupId: kafka-topic-highstore

Disk usage alert rule example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
  - name: node-disk-usage
    rules:
    - alert: node-disk-usage
      expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: node-disk-usage
        userIds: super
        receivers: SMS
      annotations:
        title: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"
        content: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"

Kafka lag alert rule example (group and instance granularity):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
  - name: kafka-topic-highstore
    rules:
    - alert: kafka-topic-highstore-${uniqueName}
      expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: kafka-topic-highstore
        instanceId: ${uniqueName}
        userIds: super
        receivers: SMS
      annotations:
        title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
        content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"

Alert Flow Example

Two nodes (node1, node2) with disk‑usage alerts illustrate grouping, repeat intervals, and recovery handling.

node1 waits for 1m → enters group → group_wait 30s → first alert
node2 joins group → group_interval 5m → second alert
repeat_interval 10m → subsequent alerts
after recovery → repeat_interval used for resolved alert

Exporter Placement

Exporters can run as sidecars (1:1) or as independent deployments (1:many). Sidecars bind to the component lifecycle; independent exporters reduce coupling and are suitable for multi‑node services like Kafka.

Promtool Usage

# Enter Prometheus pod
kubectl -n monitoring exec -it prometheus-k8s-0 sh
# Check metric format
curl -s http://ip:9999/metrics | promtool check metrics

Port‑Forward for External Access

# Prometheus
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Alerting

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.