Operations 11 min read

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains how to use Prometheus to monitor business‑level metrics in a Kubernetes environment, covering observability fundamentals, metric definitions, metric types, exposing metrics via a /metrics endpoint, and practical Go code examples for defining, recording, and scraping custom metrics.

Efficient Ops

Oct 24, 2023

How to Monitor Business Metrics with Prometheus in Kubernetes

Prometheus Monitoring Business Metrics

With Kubernetes becoming the de‑facto container orchestration standard, microservice deployments are easy, but scaling them introduces service‑governance challenges, leading to the concept of observability.

Observability

Observability consists of three pillars: logging, metrics, and tracing.

Logging records events generated during application execution, providing detailed state information but consuming significant storage and query resources.

Metrics are aggregated numeric values that require little storage and reveal system state and trends, though they lack detailed context.

Tracing follows request flows to pinpoint anomalies, but like logging it can be resource‑intensive and often uses sampling.

This article focuses on the metrics pillar, using Prometheus as the standard monitoring system for cloud‑native services.

Metric Definition

Metrics are expressed as

<metric_name>{<label_name>=<label_value>, ...}

. Metric names may contain ASCII letters, digits, underscores, and colons and must match [a-zA-Z_:][a-zA-Z0-9_:]*. Label names may contain ASCII letters, digits, and underscores and must match [a-zA-Z_][a-zA-Z0-9_]*.

Metric Types

Counter

Counters only increase (except on reset). Examples: http_requests_total, node_cpu. By convention, counter names end with _total.

Gauge

Gauges represent the current value of a metric and can go up or down. Examples: node_memory_MemFree, node_memory_MemAvailable.

Summary

Summaries record statistical distributions, such as response‑time percentiles, which are useful when averages hide outliers.

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.98e-05
go_gc_duration_seconds{quantile="0.25"} 5.31e-05
go_gc_duration_seconds{quantile="0.5"} 6.77e-05
go_gc_duration_seconds{quantile="0.75"} 0.0001428
go_gc_duration_seconds{quantile="1"} 0.0008099
go_gc_duration_seconds_sum 0.0114183
go_gc_duration_seconds_count 85

Histogram

Histograms also record distributions, exposing bucket counts ( _count) and sums ( _sum) and allow calculation of quantiles via histogram_quantile().

# HELP prometheus_http_response_size_bytes Histogram of response size for HTTP requests.
# TYPE prometheus_http_response_size_bytes histogram
prometheus_http_response_size_bytes_bucket{handler="/",le="100"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="10000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="100000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1e+06"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="+Inf"} 1
prometheus_http_response_size_bytes_sum{handler="/"} 29
prometheus_http_response_size_bytes_count{handler="/"} 1

Exposing Metrics

Prometheus typically scrapes metrics via an HTTP /metrics endpoint. Example using Gin:

server := gin.New()
server.Use(middlewares.AccessLogger(), middlewares.Metric(), gin.Recovery())
server.GET("/health", func(c *gin.Context) {
    c.JSON(http.StatusOK, gin.H{"message": "ok"})
})
server.GET("/metrics", Monitor)
func Monitor(c *gin.Context) {
    h := promhttp.Handler()
    h.ServeHTTP(c.Writer, c.Request)
}

Defining Custom Metrics

Three example metrics are defined for two business scenarios:

var (
    // HTTP request duration (Histogram)
    HTTPReqDuration *prometheus.HistogramVec
    // HTTP request total (Counter)
    HTTPReqTotal *prometheus.CounterVec
    // Running tasks (Gauge)
    TaskRunning *prometheus.GaugeVec
)
func init() {
    HTTPReqDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{Name: "http_request_duration_seconds", Help: "http request latencies in seconds"}, []string{"method", "path"})
    HTTPReqTotal = prometheus.NewCounterVec(prometheus.CounterOpts{Name: "http_requests_total", Help: "total number of http requests"}, []string{"method", "path", "status"})
    TaskRunning = prometheus.NewGaugeVec(prometheus.GaugeOpts{Name: "task_running", Help: "current count of running task"}, []string{"type", "state"})
    prometheus.MustRegister(HTTPReqDuration, HTTPReqTotal, TaskRunning)
}

During request handling, metrics are recorded:

start := time.Now()
c.Next()
duration := float64(time.Since(start)) / float64(time.Second)
path := c.Request.URL.Path
controllers.HTTPReqTotal.With(prometheus.Labels{"method": c.Request.Method, "path": path, "status": strconv.Itoa(c.Writer.Status())}).Inc()
controllers.HTTPReqDuration.With(prometheus.Labels{"method": c.Request.Method, "path": path}).Observe(duration)
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Inc()
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Dec()

Scraping Configuration

Example Prometheus scrape config for the local service:

scrape_interval: 5s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'local-service'
    metrics_path: /metrics
    static_configs:
      - targets: ['host.docker.internal:8000']

In Kubernetes, static target lists are rarely used; instead, Prometheus integrates with the Kubernetes API and supports service discovery modes such as Node, Service, Pod, Endpoints, and Ingress.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Observability Kubernetes go metrics Prometheus

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.