Cloud Native 21 min read

Mastering Prometheus on Kubernetes: A Step‑by‑Step Guide for Cloud‑Native Monitoring

This article introduces Prometheus fundamentals, its architecture and metric types, then walks through a complete Kubernetes deployment—including namespace, RBAC, ConfigMap, and various exporters—showing how to collect metrics, configure alerts, and visualize data with Grafana, while highlighting limitations and future improvements.

Huolala Tech
Huolala Tech
Huolala Tech
Mastering Prometheus on Kubernetes: A Step‑by‑Step Guide for Cloud‑Native Monitoring

Background

Feature platform uses PyFlink on Kubernetes for ETL, pulling data from HBase, Hive, relational databases into a unified feature store for data scientists, engineers, and ML engineers, addressing scattered storage, feature duplication, complex extraction, and difficult usage.

The project built its own Kubernetes container management platform running Flink, Zeppelin, etc. Monitoring the K8s cluster and alerting on anomalies is essential; Prometheus is the chosen solution.

Prometheus Overview

What is Prometheus?

Prometheus is an open‑source monitoring system originally developed by SoundCloud in Go, now a CNCF graduated project. It offers a multi‑dimensional data model, HTTP pull collection, PromQL query language, single‑node operation, service discovery, and a rich ecosystem.

It combines a monitoring/alerting system with a built‑in time‑series database (TSDB).

Architecture

Key components include Prometheus Server (scrapes targets, stores data, provides PromQL), Pushgateway (short‑lived jobs), Exporters (expose metrics), Alertmanager (deduplicate, group, route alerts), and service‑discovery mechanisms.

Metric Types

Metric format

Each sample consists of a metric name with labels, a timestamp (millisecond precision), and a float64 value.

Sample types

Counter – only increases (e.g., total HTTP requests).

Gauge – can increase or decrease (e.g., current memory usage).

Histogram – buckets for distribution analysis.

Summary – pre‑computed quantiles.

Limitations

Not suitable for logs, tracing, or events; requires complementary tools such as Fluentd and Elasticsearch.

Pull model may need careful network planning at scale.

Local storage is intended for short‑term data (≈1 month); long‑term storage needs remote back‑ends.

Practical Deployment on the Feature Platform

The platform runs Flink, Zeppelin, Elasticsearch, etc., on a Kubernetes cluster (Huawei Cloud). Prometheus monitors node performance (node‑exporter), container performance (cAdvisor), cluster state (kube‑state‑metrics), and Elasticsearch (elastic‑exporter). Alerts are sent to Alertmanager, which forwards them to Feishu via webhook.

Installation Steps

Create a dedicated namespace and RBAC rules for Prometheus.

Store the Prometheus configuration in a ConfigMap (scrape interval, alerting, rule files, target definitions).

Deploy Prometheus Server, Alertmanager, Grafana, kube‑state‑metrics, node‑exporter, and exporters using Deployment or DaemonSet resources.

Expose Prometheus and Grafana via NodePort services for external access.

Define alerting rules for pod failures, node issues, Elasticsearch health, CPU/memory thresholds, etc.

Key YAML Snippets

apiVersion: v1
kind: Namespace
metadata:
  name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: ["nodes","services","endpoints","pods"]
  verbs: ["get","list","watch"]
...

After applying all manifests with kubectl apply -f prometheus-all.yaml, the monitoring stack becomes operational.

Result Showcase

Alert rules displayed in Alertmanager UI.

Target scrape status view.

Feishu group notifications.

Grafana dashboards visualizing metrics.

Conclusion

The article presented Prometheus fundamentals, a complete deployment on a Kubernetes‑based feature platform, and integration with alerting channels. While the current setup is basic and suitable for a small cluster, future work includes high‑availability Prometheus, richer rule sets, and scaling considerations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesDevOpsPrometheus
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.