How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes
This article explains how to design and implement a comprehensive Prometheus‑based monitoring and alerting solution for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, exporter deployment, alert rule design, and practical examples with code snippets.
Design Idea
The monitoring system must scrape exposed metrics, analyze them, and generate alerts. Key questions include: what to monitor, how metrics are exposed, how Prometheus scrapes them, and how to configure dynamic alert rules.
Monitoring Objects
All big‑data components run as pods in a Kubernetes cluster.
Metric Exposure Methods
Components support three exposure types:
Directly expose Prometheus metrics (pull).
Push metrics to
prometheus‑pushgateway(push).
Use a custom exporter to convert metrics to Prometheus format (pull).
Some components, like Flink on YARN, require the push method because they run inside YARN containers.
Metric Scraping
Prometheus pulls targets using configurations such as native Job,
PodMonitor, and
ServiceMonitor(via Prometheus Operator).
Reference: https://cloud.tencent.com/document/product/1416/55995
When running on Kubernetes,
PodMonitoris usually preferred for its simplicity.
Alert Design
Alert Flow
Service experiences an anomaly.
Prometheus fires an alert.
Alertmanager receives the alert.
Alertmanager processes it according to configured rules (grouping, inhibition, notification).
Timing of alert triggering is critical.
Dynamic Alert Configuration
Alerting consists of two parts:
alertmanager(handling policies) and
alertRule(specific rules).
Custom Alert Platform Integration
Beyond webhook handling, production environments should forward alerts to a custom platform via webhook for advanced processing, deduplication, and multi‑channel notifications.
Alert Hierarchy Labels
Granularity of monitoring objects determines alert grouping; labels guide Alertmanager routing.
Technical Implementation
Implementation is divided into:
Kubernetes deployment of Prometheus (kube‑prometheus).
Enhanced configuration using
kubernetes_sd_config+ relabel.
Implementation of a big‑data exporter.
Alert design examples.
1. Deploying Prometheus on Kubernetes
Both
kube‑prometheusand
prometheus‑operatorcan create and manage Prometheus; this guide uses
kube‑prometheuswhich builds on the operator and provides default manifests.
Note: Ensure the Kubernetes version matches the kube‑prometheus version (e.g., k8s 1.14 → kube‑prometheus 0.3, operator 0.32).
Configuration files are written in Jsonnet and compiled with
jpin a Go environment.
2. kubernetes_sd_config + relabel Solution
Uses native job configuration with
additional‑scrape‑configto enable automatic pod discovery and label rewriting.
3. bigdata‑exporter Implementation
The exporter collects metrics from components such as HDFS, YARN, HBase, etc., converts them to Prometheus format, and exposes them.
<code>annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: "http"
prometheus.io/path: "/metrics"
prometheus.io/port: "19091"
</code>Exporter discovery relies on pod labels and annotations, with a
rolelabel indicating the parsing rule.
<code>labels:
bigData.metrics.object: pod
annotations:
bigData.metrics/scrape: "true"
bigData.metrics/scheme: "https"
bigData.metrics/path: "/jmx"
bigData.metrics/port: "29871"
bigData.metrics/role: "hdfs-nn,common"
</code>4. Alert Design Examples
Example
alertmanagerconfiguration with two receivers (default and a custom webhook):
<code>global:
resolve_timeout: 5m
receivers:
- name: 'default'
- name: 'test.web.hook'
webhook_configs:
- url: 'http://alert-url'
route:
receiver: 'default'
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
group_by: [groupId,instanceId]
routes:
- receiver: 'test.web.hook'
continue: true
match:
groupId: node-disk-usage
- receiver: 'test.web.hook'
continue: true
match:
groupId: kafka-topic-highstore
</code>Disk usage alert rule example:
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: node-disk-usage
namespace: monitoring
spec:
groups:
- name: node-disk-usage
rules:
- alert: node-disk-usage
expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
for: 1m
labels:
groupId: node-disk-usage
userIds: super
receivers: SMS
annotations:
title: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"
content: "Disk warning: node {{$labels.instance}} ${path} usage {{ $value }}%"
</code>Kafka lag alert rule example (group and instance granularity):
<code>apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kafka-topic-highstore-${uniqueName}
namespace: monitoring
spec:
groups:
- name: kafka-topic-highstore
rules:
- alert: kafka-topic-highstore-${uniqueName}
expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
for: 1m
labels:
groupId: kafka-topic-highstore
instanceId: ${uniqueName}
userIds: super
receivers: SMS
annotations:
title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
</code>Alert Flow Example
Two nodes (node1, node2) with disk‑usage alerts illustrate grouping, repeat intervals, and recovery handling.
<code>node1 waits for 1m → enters group → group_wait 30s → first alert
node2 joins group → group_interval 5m → second alert
repeat_interval 10m → subsequent alerts
after recovery → repeat_interval used for resolved alert
</code>Exporter Placement
Exporters can run as sidecars (1:1) or as independent deployments (1:many). Sidecars bind to the component lifecycle; independent exporters reduce coupling and are suitable for multi‑node services like Kafka.
Promtool Usage
<code># Enter Prometheus pod
kubectl -n monitoring exec -it prometheus-k8s-0 sh
# Check metric format
curl -s http://ip:9999/metrics | promtool check metrics
</code>Port‑Forward for External Access
<code># Prometheus
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &
</code>Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.