Operations 18 min read

How to Build a Scalable Kube‑Prometheus Monitoring Stack for Big Data on Kubernetes

This article explains how to design and implement a robust monitoring solution for big‑data components running on Kubernetes using Prometheus, covering metric exposure methods, scrape configurations, alerting architecture, custom exporters, and practical deployment tips.

Architecture Digest
Architecture Digest
Architecture Digest
How to Build a Scalable Kube‑Prometheus Monitoring Stack for Big Data on Kubernetes

Design Overview

Monitoring big‑data platforms is a persistent pain point because they require both stable operation and performance optimization. The core tasks are exposing metrics, scraping them, analyzing, and alerting.

What is being monitored?

How are metrics exposed?

How does Prometheus scrape them?

How to configure dynamic alert rules?

Monitoring Targets

The targets are the pods running various big‑data components (HDFS, YARN, HBase, Spark, Flink, etc.) in a Kubernetes cluster.

Metric Exposure Methods

Components can expose metrics in three ways:

Directly expose Prometheus‑compatible metrics (pull).

Push metrics to prometheus-pushgateway (push).

Use a custom exporter that converts native metrics to Prometheus format (pull).

Some components, like Flink, support multiple methods.

Scrape Configuration

Prometheus ultimately pulls metrics from the targets. The following job types are commonly used:

Native Job : Direct Prometheus job configuration.

PodMonitor : Uses Prometheus Operator to scrape pod metrics.

ServiceMonitor : Uses Prometheus Operator to scrape service endpoints.

When running on Kubernetes, PodMonitor is usually preferred for its simplicity.

Alert Design

Alert Flow

Service encounters an anomaly.

Prometheus generates an alert.

Alertmanager receives the alert.

Alertmanager applies routing, grouping, suppression, and forwards to notification channels (e.g., webhook, SMS, email).

Fine‑tune the timing (group_wait, group_interval, repeat_interval) to control alert frequency.

Dynamic Alert Configuration

kube‑prometheus

splits alert configuration into two parts:

alertmanager : Handles routing and notification policies.

alertRule : Defines the actual alert expressions.

Custom alert platforms can be integrated via webhook for business‑specific processing.

Example Alert Rules

Disk usage alert (group level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
  - name: node-disk-usage
    rules:
    - alert: node-disk-usage
      expr: 100 * (1 - node_filesystem_avail_bytes{mountpoint="${path}"} / node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: node-disk-usage
        userIds: super
        receivers: SMS
      annotations:
        title: "Disk warning: node {{ $labels.instance }} path ${path} usage {{ $value }}%"
        content: "Disk warning: node {{ $labels.instance }} path ${path} usage {{ $value }}%"

Kafka lag alert (instance level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
  - name: kafka-topic-highstore
    rules:
    - alert: kafka-topic-highstore-${uniqueName}
      expr: sum(kafka_consumergroup_lag{exporterType="kafka", consumergroup="${consumergroup}"}) > ${thresholdValue}
      for: 1m
      labels:
        groupId: kafka-topic-highstore
        instanceId: ${uniqueName}
        userIds: super
        receivers: SMS
      annotations:
        title: "KAFKA warning: consumer group ${consumergroup} lag ${value}"
        content: "KAFKA warning: consumer group ${consumergroup} lag ${value}"

Technical Implementation

Deploying Prometheus on Kubernetes

Two related projects exist under CoreOS: kube-prometheus and prometheus-operator. Both can create and manage Prometheus instances, but kube-prometheus provides extensive default manifests and uses the operator under the hood.

Choose the version matching your Kubernetes version (e.g., k8s 1.14 → kube‑prometheus 0.3, prometheus‑operator 0.32).

Configuration Generation

kube-prometheus

uses Jsonnet templates to generate Kubernetes manifests. The default manifests are located in the manifests directory. To customize, run jsonnet (or jb) in a Go environment.

kubernetes_sd_config + relabel

For dynamic discovery, use the native job with kubernetes_sd_config. This enables Prometheus to discover pods via the Kubernetes API and apply relabel rules to filter and modify labels.

scrape_configs:
- job_name: "bigdata-exporter"
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_bigData_metrics_object]
    regex: pod
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_bigData.metrics/scrape]
    regex: "true"
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_bigData.metrics/path]
    target_label: __metrics_path__
  - source_labels: [__meta_kubernetes_pod_annotation_bigData.metrics/port]
    target_label: __address__
    replacement: "${1}:19091"

bigdata‑exporter

The exporter runs as a sidecar or standalone pod, discovers target pods via labels/annotations, and converts component‑specific metrics (e.g., JMX from HDFS, YARN) into Prometheus format. Example pod annotation:

labels:
  bigData.metrics.object: pod
annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"

Exporter Deployment Options

Exporters can be deployed as a sidecar (1:1 with the target pod) or as a standalone service (1:many). Sidecars simplify lifecycle management; standalone exporters reduce coupling and are suitable for multi‑node services like Kafka.

Validation and Access

Use promtool check metrics to validate metric syntax and kubectl port-forward to expose Prometheus, Grafana, and Alertmanager for debugging.

# Expose Prometheus UI
kubectl -n monitoring port-forward service/prometheus-k8s 19090:9090 &
# Expose Grafana UI
kubectl -n monitoring port-forward service/grafana 13000:3000 &
# Expose Alertmanager UI
kubectl -n monitoring port-forward service/alertmanager-main 9093:9093 &

Other Considerations

Alert grouping labels (e.g., groupId, instanceId) determine how alerts are aggregated and routed. Timing parameters:

for : Duration a condition must hold before firing.

group_wait : Initial wait after a new group appears.

group_interval : Interval when the group content changes.

repeat_interval : Interval for repeated alerts when the group remains unchanged.

Note that resolved alerts also use repeat_interval, which may feel counter‑intuitive.

Monitoring diagram
Monitoring diagram
big dataKubernetesPrometheusAlertmanagerExporterkube-prometheus
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.