How to Build a Scalable Kubernetes Monitoring System for Big Data with kube-prometheus
This article explains how to design and implement a flexible kube‑prometheus‑based monitoring solution for big‑data applications running on Kubernetes, covering metric exposure methods, scrape configurations, alert rule design, custom alert platforms, and practical deployment tips.
This article introduces how to design a monitoring system based on kube‑prometheus to collect metrics from applications running on Kubernetes in a simple and flexible way, and to achieve monitoring and alerting functions.
The author provides an application example and notes from learning and using Prometheus, such as obtaining ARM images and using various tools.
Preface
Big data, as a foundational platform, has long‑standing challenges in operational monitoring and performance evaluation, making a robust monitoring system essential.
Prometheus is the most popular monitoring software in the cloud‑native era , and many big‑data components support it natively or via third‑party exporters.
The author's big‑data platform runs on Kubernetes, offering flexible deployment and easy integration with Prometheus.
The following sections discuss design ideas and technical implementation.
Design Ideas
The core task of a monitoring system is to scrape exposed metric data, then analyze and alert on it. Key questions include:
What is being monitored?
How does the target expose metric data?
How does the monitoring system scrape the metrics?
How to implement dynamic alert rule configuration and management?
Monitoring Targets
Big‑data components running as pods in the Kubernetes cluster.
Metric Exposure Methods
Components support three exposure types based on Prometheus support:
Directly expose Prometheus metrics (pull).
Push metrics to a Prometheus PushGateway (push).
Use a custom exporter to convert other metric formats to Prometheus format (exporter, pull).
Most components have official or third‑party exporters; a few require custom development. Generally, the direct method suffices.
When running Flink or Spark on YARN, the nodes run inside YARN containers, making direct scraping difficult; in such cases, push to a PushGateway is required. Short‑lived components are also recommended to push metrics.
Metric Scrape Methods
Regardless of exporter or PushGateway, Prometheus ultimately pulls the targets.
Prometheus uses pull to fetch metrics from exposed endpoints, configured via jobs such as native Job, PodMonitor, and ServiceMonitor. For Kubernetes environments, PodMonitor is usually preferred for its simplicity.
The main configuration file prometheus-prometheus.yaml contains selectors for ServiceMonitor, PodMonitor, RuleSelector, and Alertmanagers. Modifying this file triggers a Prometheus restart.
For complex environments, the native Kubernetes service discovery ( kubernetes_sd_config) combined with relabeling can be used, typically via an additional scrape config.
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: "http"
prometheus.io/path: "/metrics"
prometheus.io/port: "19091"References for this configuration include Prometheus relabel documentation and Kubernetes service discovery guides.
Alert Design
Alert Flow
Service experiences an anomaly.
Prometheus triggers an alert.
Alertmanager receives the alert.
Alertmanager processes the alert according to predefined rules (grouping, silencing, notifications).
Timing of alert triggering is critical and involves many details.
Dynamic Alert Configuration
kube‑prometheus separates alert configuration into two parts:
Alertmanager: handling policies for alerts.
Reference: Alertmanager configuration guide.
AlertRule: specific alert rules.
Reference: Prometheus alert rule guide.
In Kubernetes, alerts are managed as PrometheusRule resources, manipulated like pods.
Custom Alert Platform Integration
Beyond Alertmanager's webhook, custom alert platforms can receive alerts via webhook for business‑specific processing, recording, and multi‑channel notifications.
Alertmanager performs pre‑processing (grouping, silencing), while the custom platform handles business logic (recording, de‑identification, multi‑channel delivery).
Alert Hierarchy Tag Design
The granularity of monitoring targets determines alert hierarchy, reflected in grouping labels used by Alertmanager for routing.
Design should align with business needs.
Technical Implementation
Kubernetes Deployment of Prometheus (kube‑prometheus)
Two projects exist: kube‑prometheus and prometheus‑operator. Both can create and manage Prometheus, but kube‑prometheus builds on prometheus‑operator and provides many default configurations.
Ensure the Kubernetes version matches the kube‑prometheus version (e.g., k8s 1.14 works with kube‑prometheus 0.3 and prometheus‑operator 0.32).
kube‑prometheus uses Jsonnet to generate manifests; default manifests are in the manifests directory. Apply them with kubectl create -f manifests/setup and then kubectl create -f manifests/.
#Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
$ kubectl create -f manifests/setup
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl create -f manifests/The kubernetes_sd_config+relabel solution is available at the author's GitHub repository.
bigdata‑exporter Implementation
Components like HDFS, YARN, and HBase expose JMX metrics via HTTP. The bigdata‑exporter collects metrics from multiple components and nodes, converts them to Prometheus format, and exposes them.
Discovery of scrape targets can reuse the kubernetes_sd_config+relabel approach, using labels and annotations to convey role and endpoint information.
labels:
bigData.metrics.object: pod
annotations:
bigData.metrics/scrape: "true"
bigData.metrics/scheme: "https"
bigData.metrics/path: "/jmx"
bigData.metrics/port: "29871"
bigData.metrics/role: "hdfs-nn,common"Alert Design Example
Example uses groupId and instanceId as dimensions.
Alertmanager configuration includes two receivers (default and a custom webhook) and routes based on groupId.
$ kubectl -n monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run -o yaml | kubectl -n=monitoring apply -f - global:
resolve_timeout: 5m
receivers:
- name: 'default'
- name: 'test.web.hook'
webhook_configs:
- url: 'http://alert-url'
route:
receiver: 'default'
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
group_by: [groupId,instanceId]
routes:
- receiver: 'test.web.hook'
continue: true
match:
groupId: node-disk-usage
- receiver: 'test.web.hook'
continue: true
match:
groupId: kafka-topic-highstorealertRule Example
Disk usage alert rule (group level):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
role: alert-rules
name: node-disk-usage
namespace: monitoring
spec:
groups:
- name: node-disk-usage
rules:
- alert: node-disk-usage
expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
for: 1m
labels:
groupId: node-disk-usage
userIds: super
receivers: SMS
annotations:
title: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"
content: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"Kafka queue lag alert rule (instance level):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
role: alert-rules
name: kafka-topic-highstore-${uniqueName}
namespace: monitoring
spec:
groups:
- name: kafka-topic-highstore
rules:
- alert: kafka-topic-highstore-${uniqueName}
expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
for: 1m
labels:
groupId: kafka-topic-highstore
instanceId: ${uniqueName}
userIds: super
receivers: SMS
annotations:
title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"Other Topics
Exporter Placement
Exporters can run as sidecars (1:1) or as independent deployments (1:1 or 1:many). Sidecars bind lifecycle to the main container; independent exporters reduce coupling and are more flexible for multi‑node services like Kafka.
Using promtool to Validate Metric Format
# Enter pod
$ kubectl -n=monitoring exec -it prometheus-k8s-0 sh
# Show help
$ promtool -h
# Check metric format
$ curl -s http://ip:9999/metrics | promtool check metricsMetric names and label names cannot contain dots.
Port‑Forward for Temporary External Access
# Prometheus
$ nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana
$ nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager
$ nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &ARM Support in kube‑prometheus
The goal is to find ARM‑compatible images used by kube‑prometheus. Images such as quay.io/prometheus/prometheus:v2.11.0 (ARM supported from v2.10.0) and quay.io/prometheus/alertmanager:v0.18.0 (ARM supported from v0.17.0) are usable, while others like quay.io/coreos/kube-state-metrics:v1.8.0 lack ARM support.
Some images claim ARM support but have bugs (e.g., grafana/grafana:6.4.3), and newer versions of the Prometheus operator require Kubernetes >=1.16.
Linux Cloud Computing Practice
Welcome to Linux Cloud Computing Practice. We offer high-quality articles on Linux, cloud computing, DevOps, networking and related topics. Dive in and start your Linux cloud computing journey!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
