Big Data 19 min read

Build a Scalable Kube-Prometheus Monitoring System for Big Data on Kubernetes

This article explains how to design and implement a flexible kube‑prometheus‑based monitoring system for big‑data applications running on Kubernetes, covering metric collection methods, scrape configurations, alert rule design, exporter deployment, and practical examples with code snippets.

MaGe Linux Operations

Sep 12, 2022

Build a Scalable Kube-Prometheus Monitoring System for Big Data on Kubernetes

Introduction

Monitoring is a pain point for big‑data platforms; building a reliable monitoring system on Kubernetes using kube‑prometheus simplifies metric collection and alerting.

Design ideas

The monitoring system must answer: what to monitor, how metrics are exposed, how Prometheus scrapes them, and how alert rules are managed dynamically.

Monitoring objects

All big‑data components run as pods in the Kubernetes cluster.

Metric exposure methods

Components expose metrics in three ways:

Directly expose Prometheus metrics (pull).

Push metrics to a Prometheus Pushgateway (push).

Use a custom exporter to convert metrics to Prometheus format (pull).

Most components have official or third‑party exporters; direct exposure is usually sufficient, but for YARN‑mode Flink or short‑lived pods a pushgateway is needed.

Metric scraping

Prometheus pulls metrics from exporters or pushgateway. In Kubernetes the preferred method is PodMonitor, which is simpler than native Job or ServiceMonitor configurations.

Prometheus mainly uses pull to fetch metrics; supported jobs include native Job, PodMonitor, and ServiceMonitor.

Alert design

Alert flow

Service anomaly occurs.

Prometheus generates an alert.

Alertmanager receives the alert.

Alertmanager processes it according to configured routing, grouping, silencing, and notification channels.

Dynamic alert rule configuration is split into Alertmanager policies and PrometheusRule objects.

Dynamic configuration example

Example Alertmanager secret configuration with two receivers (default and a custom webhook) and routing based on groupId and instanceId:

global:
  resolve_timeout: 5m
receivers:
  - name: 'default'
  - name: 'test.web.hook'
    webhook_configs:
      - url: 'http://alert-url'
route:
  receiver: 'default'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h
  group_by: [groupId,instanceId]
  routes:
    - receiver: 'test.web.hook'
      continue: true
      match:
        groupId: node-disk-usage
    - receiver: 'test.web.hook'
      continue: true
      match:
        groupId: kafka-topic-highstore

Example PrometheusRule for node disk usage alert:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
  - name: node-disk-usage
    rules:
    - alert: node-disk-usage
      expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: node-disk-usage
        userIds: super
        receivers: SMS
      annotations:
        title: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"
        content: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"

Example PrometheusRule for Kafka lag alert:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
  - name: kafka-topic-highstore
    rules:
    - alert: kafka-topic-highstore-${uniqueName}
      expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: kafka-topic-highstore
        instanceId: ${uniqueName}
        userIds: super
        receivers: SMS
      annotations:
        title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
        content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"

Technical implementation

Deploy Prometheus on Kubernetes using kube‑prometheus (jsonnet templates).

Enhance configuration with kubernetes_sd_config and relabeling for automatic pod discovery.

Implement a big‑data exporter that collects metrics from HDFS, YARN, HBase, etc., using labels/annotations to specify target, scheme, path, port, and role.

Provide concrete alert rule examples.

kube‑prometheus vs prometheus‑operator

Both projects can create and manage Prometheus; kube‑prometheus builds on prometheus‑operator and provides default manifests.

Installation steps

# Create namespace and CRDs, wait for them
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces; do date; sleep 1; echo ""; done
kubectl create -f manifests/

Using kubernetes_sd_config + relabel

Refer to the GitHub repository "linshenkx/kube-prometheus-enhance" for the implementation.

big‑data exporter configuration example

annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"

Exporter placement

Exporters can run as sidecars (1:1) or as independent deployments (1:many). Independent deployment reduces coupling and is preferred for multi‑node services like Kafka.

Checking metric format with promtool

# Enter pod
kubectl -n monitoring exec -it prometheus-k8s-0 sh
# Show help
promtool -h
# Validate metrics
curl -s http://ip:9999/metrics | promtool check metrics

Port‑forward for external access

# Prometheus
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n monitoring &
# Grafana
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n monitoring &
# Alertmanager
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n monitoring &

ARM support in kube‑prometheus images

List of images and their ARM support status, e.g., quay.io/prometheus/prometheus:v2.11.0 supports ARM from v2.10.0, while quay.io/coreos/kube-state-metrics:v1.8.0 does not.

alerting Exporter kube-prometheus

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.