Big Data 22 min read

How to Build a Scalable Kubernetes Monitoring System for Big Data with kube-prometheus

This article explains how to design and implement a flexible kube‑prometheus‑based monitoring solution for big‑data applications running on Kubernetes, covering metric exposure methods, scrape configurations, alert rule design, custom alert platforms, and practical deployment tips.

Linux Cloud Computing Practice

Feb 19, 2024

How to Build a Scalable Kubernetes Monitoring System for Big Data with kube-prometheus

This article introduces how to design a monitoring system based on kube‑prometheus to collect metrics from applications running on Kubernetes in a simple and flexible way, and to achieve monitoring and alerting functions.

The author provides an application example and notes from learning and using Prometheus, such as obtaining ARM images and using various tools.

Preface

Big data, as a foundational platform, has long‑standing challenges in operational monitoring and performance evaluation, making a robust monitoring system essential.

Prometheus is the most popular monitoring software in the cloud‑native era , and many big‑data components support it natively or via third‑party exporters.

The author's big‑data platform runs on Kubernetes, offering flexible deployment and easy integration with Prometheus.

The following sections discuss design ideas and technical implementation.

Design Ideas

The core task of a monitoring system is to scrape exposed metric data, then analyze and alert on it. Key questions include:

What is being monitored?

How does the target expose metric data?

How does the monitoring system scrape the metrics?

How to implement dynamic alert rule configuration and management?

Monitoring Targets

Big‑data components running as pods in the Kubernetes cluster.

Metric Exposure Methods

Components support three exposure types based on Prometheus support:

Directly expose Prometheus metrics (pull).

Push metrics to a Prometheus PushGateway (push).

Use a custom exporter to convert other metric formats to Prometheus format (exporter, pull).

Most components have official or third‑party exporters; a few require custom development. Generally, the direct method suffices.

When running Flink or Spark on YARN, the nodes run inside YARN containers, making direct scraping difficult; in such cases, push to a PushGateway is required. Short‑lived components are also recommended to push metrics.

Metric Scrape Methods

Regardless of exporter or PushGateway, Prometheus ultimately pulls the targets.

Prometheus uses pull to fetch metrics from exposed endpoints, configured via jobs such as native Job, PodMonitor, and ServiceMonitor. For Kubernetes environments, PodMonitor is usually preferred for its simplicity.

The main configuration file prometheus-prometheus.yaml contains selectors for ServiceMonitor, PodMonitor, RuleSelector, and Alertmanagers. Modifying this file triggers a Prometheus restart.

For complex environments, the native Kubernetes service discovery ( kubernetes_sd_config) combined with relabeling can be used, typically via an additional scrape config.

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/scheme: "http"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "19091"

References for this configuration include Prometheus relabel documentation and Kubernetes service discovery guides.

Alert Design

Alert Flow

Service experiences an anomaly.

Prometheus triggers an alert.

Alertmanager receives the alert.

Alertmanager processes the alert according to predefined rules (grouping, silencing, notifications).

Timing of alert triggering is critical and involves many details.

Dynamic Alert Configuration

kube‑prometheus separates alert configuration into two parts:

Alertmanager: handling policies for alerts.

Reference: Alertmanager configuration guide.

AlertRule: specific alert rules.

Reference: Prometheus alert rule guide.

In Kubernetes, alerts are managed as PrometheusRule resources, manipulated like pods.

Custom Alert Platform Integration

Beyond Alertmanager's webhook, custom alert platforms can receive alerts via webhook for business‑specific processing, recording, and multi‑channel notifications.

Alertmanager performs pre‑processing (grouping, silencing), while the custom platform handles business logic (recording, de‑identification, multi‑channel delivery).

Alert Hierarchy Tag Design

The granularity of monitoring targets determines alert hierarchy, reflected in grouping labels used by Alertmanager for routing.

Design should align with business needs.

Technical Implementation

Kubernetes Deployment of Prometheus (kube‑prometheus)

Two projects exist: kube‑prometheus and prometheus‑operator. Both can create and manage Prometheus, but kube‑prometheus builds on prometheus‑operator and provides many default configurations.

Ensure the Kubernetes version matches the kube‑prometheus version (e.g., k8s 1.14 works with kube‑prometheus 0.3 and prometheus‑operator 0.32).

kube‑prometheus uses Jsonnet to generate manifests; default manifests are in the manifests directory. Apply them with kubectl create -f manifests/setup and then kubectl create -f manifests/.

#Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
$ kubectl create -f manifests/setup
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl create -f manifests/

The kubernetes_sd_config+relabel solution is available at the author's GitHub repository.

bigdata‑exporter Implementation

Components like HDFS, YARN, and HBase expose JMX metrics via HTTP. The bigdata‑exporter collects metrics from multiple components and nodes, converts them to Prometheus format, and exposes them.

Discovery of scrape targets can reuse the kubernetes_sd_config+relabel approach, using labels and annotations to convey role and endpoint information.

labels:
  bigData.metrics.object: pod
annotations:
  bigData.metrics/scrape: "true"
  bigData.metrics/scheme: "https"
  bigData.metrics/path: "/jmx"
  bigData.metrics/port: "29871"
  bigData.metrics/role: "hdfs-nn,common"

Alert Design Example

Example uses groupId and instanceId as dimensions.

Alertmanager configuration includes two receivers (default and a custom webhook) and routes based on groupId.

$ kubectl -n monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run -o yaml | kubectl -n=monitoring apply -f -

global:
  resolve_timeout: 5m
receivers:
  - name: 'default'
  - name: 'test.web.hook'
    webhook_configs:
      - url: 'http://alert-url'
route:
  receiver: 'default'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h
  group_by: [groupId,instanceId]
  routes:
    - receiver: 'test.web.hook'
      continue: true
      match:
        groupId: node-disk-usage
    - receiver: 'test.web.hook'
      continue: true
      match:
        groupId: kafka-topic-highstore

alertRule Example

Disk usage alert rule (group level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    role: alert-rules
  name: node-disk-usage
  namespace: monitoring
spec:
  groups:
    - name: node-disk-usage
      rules:
        - alert: node-disk-usage
          expr: 100*(1-node_filesystem_avail_bytes{mountpoint="${path}"}/node_filesystem_size_bytes{mountpoint="${path}"}) > ${thresholdValue}
          for: 1m
          labels:
            groupId: node-disk-usage
            userIds: super
            receivers: SMS
          annotations:
            title: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"
            content: "Disk warning: node {{$labels.instance}} path ${path} usage {{ $value }}%"

Kafka queue lag alert rule (instance level):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    role: alert-rules
  name: kafka-topic-highstore-${uniqueName}
  namespace: monitoring
spec:
  groups:
    - name: kafka-topic-highstore
      rules:
        - alert: kafka-topic-highstore-${uniqueName}
          expr: sum(kafka_consumergroup_lag{exporterType="kafka",consumergroup="${consumergroup}"}) > ${thresholdValue}
          for: 1m
          labels:
            groupId: kafka-topic-highstore
            instanceId: ${uniqueName}
            userIds: super
            receivers: SMS
          annotations:
            title: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"
            content: "KAFKA warning: consumer group ${consumergroup} lag {{ $value }}"

Exporter Placement

Exporters can run as sidecars (1:1) or as independent deployments (1:1 or 1:many). Sidecars bind lifecycle to the main container; independent exporters reduce coupling and are more flexible for multi‑node services like Kafka.

Using promtool to Validate Metric Format

# Enter pod
$ kubectl -n=monitoring exec -it prometheus-k8s-0 sh
# Show help
$ promtool -h
# Check metric format
$ curl -s http://ip:9999/metrics | promtool check metrics

Metric names and label names cannot contain dots.

Port‑Forward for Temporary External Access

# Prometheus
$ nohup kubectl port-forward --address 0.0.0.0 service/prometheus-k8s 19090:9090 -n=monitoring &
# Grafana
$ nohup kubectl port-forward --address 0.0.0.0 service/grafana 13000:3000 -n=monitoring &
# Alertmanager
$ nohup kubectl port-forward --address 0.0.0.0 service/alertmanager-main 9093:9093 -n=monitoring &

ARM Support in kube‑prometheus

The goal is to find ARM‑compatible images used by kube‑prometheus. Images such as quay.io/prometheus/prometheus:v2.11.0 (ARM supported from v2.10.0) and quay.io/prometheus/alertmanager:v0.18.0 (ARM supported from v0.17.0) are usable, while others like quay.io/coreos/kube-state-metrics:v1.8.0 lack ARM support.

Some images claim ARM support but have bugs (e.g., grafana/grafana:6.4.3), and newer versions of the Prometheus operator require Kubernetes >=1.16.

alerting Exporter kube-prometheus

Written by

Linux Cloud Computing Practice

Welcome to Linux Cloud Computing Practice. We offer high-quality articles on Linux, cloud computing, DevOps, networking and related topics. Dive in and start your Linux cloud computing journey!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Preface

Design Ideas

Monitoring Targets

Metric Exposure Methods

Metric Scrape Methods

Alert Design

Alert Flow

Dynamic Alert Configuration

Custom Alert Platform Integration

Alert Hierarchy Tag Design

Technical Implementation

Kubernetes Deployment of Prometheus (kube‑prometheus)

bigdata‑exporter Implementation

Alert Design Example

alertRule Example

Other Topics

Exporter Placement

Using promtool to Validate Metric Format

Port‑Forward for Temporary External Access

ARM Support in kube‑prometheus

Linux Cloud Computing Practice

How this landed with the community

Was this worth your time?

0 Comments