How to Build a Scalable Prometheus Monitoring System for Big Data on Kubernetes
This article explains how to design and implement a Prometheus‑based monitoring solution for big‑data components running on Kubernetes, covering metric exposure methods, scrape configurations, alerting architecture, exporter development, and practical code examples for a production‑ready setup.
Design Overview
The monitoring system must reliably collect, analyze, and alert on metric data exposed by big‑data components running in a Kubernetes cluster. Key questions include identifying the monitoring targets, how those targets expose metrics, how Prometheus scrapes them, and how alert rules are dynamically configured.
What is being monitored?
How are metrics exposed?
How does Prometheus scrape the metrics?
How are alert rules managed dynamically?
Monitoring Objects
All big‑data services are deployed as Pods in the Kubernetes cluster, making each Pod a monitoring target.
Metric Exposure Methods
Directly expose Prometheus‑compatible metrics (pull).
Push metrics to a prometheus‑pushgateway (push).
Use a custom exporter to translate other formats into Prometheus exposition format (pull).
Components such as Flink may support multiple methods; most components provide an official or third‑party exporter, and the direct method is usually sufficient.
Scrape Configuration
Prometheus ultimately pulls metrics from the chosen targets. In a Kubernetes environment the common scrape configurations are:
Native Job configuration. PodMonitor (via Prometheus Operator) for pod‑level metrics. ServiceMonitor (via Prometheus Operator) for service‑level metrics.
When the platform already runs on Kubernetes, PodMonitor is recommended for its simplicity.
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: "http"
prometheus.io/path: "/metrics"
prometheus.io/port: "19091"Alert Design
The alert flow consists of four steps: service anomaly → Prometheus generates an alert → Alertmanager receives the alert → Alertmanager applies routing, grouping, silencing, and notification actions (e.g., SMS, email, webhook).
Dynamic configuration of alertmanager policies and alertRule definitions.
Integration with a custom alert platform via webhook for business‑specific handling.
Use of hierarchical label groups (e.g., groupId, instanceId) to drive multi‑level routing.
Technical Implementation
Deploy Prometheus in Kubernetes using kube‑prometheus (which builds on prometheus‑operator).
Enhance the default configuration with kubernetes_sd_config plus relabel rules for automatic pod discovery.
Develop a bigdata‑exporter that gathers JMX metrics from HDFS, YARN, HBase, etc., and exposes them in Prometheus format.
Provide concrete alert rule examples for node‑disk usage and Kafka lag.
Kube‑Prometheus vs Prometheus‑Operator
Both projects can create and manage Prometheus instances. kube‑prometheus bundles a large set of default manifests and relies on prometheus‑operator. Choose the version that matches your Kubernetes version (e.g., K8s 1.14 → kube‑prometheus 0.3 with prometheus‑operator 0.32).
Kubernetes SD Config + Relabel
The kubernetes_sd_config enables Prometheus to discover pods via the Kubernetes API. Relabel rules rewrite discovered labels, allowing fine‑grained filtering and dynamic scrape target generation.
BigData Exporter
The exporter runs either as a sidecar in the same pod as the component (1:1) or as an independent deployment (1:1 or 1:many). It uses pod labels and annotations to discover targets and a role field to select the appropriate parsing logic.
labels:
bigData.metrics.object: pod
annotations:
bigData.metrics/scrape: "true"
bigData.metrics/scheme: "https"
bigData.metrics/path: "/jmx"
bigData.metrics/port: "29871"
bigData.metrics/role: "hdfs-nn,common"Alert Rule Examples
Node‑disk‑usage rule (group‑level):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: node-disk-usage
namespace: monitoring
spec:
groups:
- name: node-disk-usage
rules:
- alert: node-disk-usage
expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
for: 1m
labels:
groupId: node-disk-usage
userIds: super
receivers: SMS
annotations:
title: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"
content: "Disk warning: node {{$labels.instance}} ${path} usage {{$value}}%"Kafka‑lag rule (instance‑level):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kafka-topic-highstore-${uniqueName}
namespace: monitoring
spec:
groups:
- name: kafka-topic-highstore
rules:
- alert: kafka-topic-highstore-${uniqueName}
expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
for: 1m
labels:
groupId: kafka-topic-highstore
instanceId: ${uniqueName}
userIds: super
receivers: SMS
annotations:
title: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"
content: "KAFKA warning: consumer group ${consumergroup} lag {{$value}}"Alert Flow Example
Two nodes (node1, node2) are monitored for disk usage. The timeline demonstrates how for, group_wait, group_interval, and repeat_interval affect alert firing and grouping.
node1 waits for 1m → enters group
group_wait 30s → first alert (node1)
node2 waits for 1m → joins group
group_interval 5m → second alert (node1,node2)
repeat_interval 10m → subsequent alerts while group unchangedKey Timing Parameters
for: duration a condition must persist before the rule fires. group_wait: initial wait after a new group is created. group_interval: interval between alerts when the group’s content changes. repeat_interval: interval between alerts when the group’s content stays the same.
Exporter Placement
Sidecar deployment ties the exporter’s lifecycle to the monitored component (ideal for single‑instance services). Independent deployment decouples the exporter, allowing one exporter to monitor many instances (useful for multi‑node services like Kafka).
Promtool Usage
# Enter the Prometheus pod
kubectl -n=monitoring exec -it prometheus-k8s-0 sh
# Show help
promtool -h
# Validate metric format
curl -s http://ip:9999/metrics | promtool check metricsPort‑Forward for External Access
# Prometheus UI
nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana UI
nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager UI
nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &Reference: https://cloud.tencent.com/document/product/1416/55995
GitHub repository for the enhanced kube‑prometheus configuration: https://github.com/linshenkx/kube-prometheus-enhance
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
