Cloud Native 11 min read

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains the concept of observability, details Prometheus metric definitions and types, and provides Go code examples for exposing, defining, generating, and scraping business‑level metrics in a Kubernetes‑based cloud‑native environment.

Efficient Ops
Efficient Ops
Efficient Ops
How to Monitor Business Metrics with Prometheus in Kubernetes

Prometheus Monitoring Business Metrics

With Kubernetes becoming the de‑facto container orchestration standard, microservice deployments are easy, but scaling them introduces service‑governance challenges, prompting the rise of observability.

Observability enables rapid fault localisation and even pre‑emptive detection of anomalies in distributed systems.

Observability

Observability is built on three pillars: logging, metrics, and tracing.

Logging records events generated during application execution, offering detailed state insight but consuming significant storage and query resources.

Metrics are aggregated numeric values that require minimal storage; they show system state and trends but lack fine‑grained detail, often enhanced with multidimensional structures.

Tracing follows request flows to pinpoint anomalies, sharing logging’s high resource cost and typically using sampling.

This article focuses on the metrics pillar within a Kubernetes‑based infrastructure, where Prometheus is the de‑facto monitoring solution for cloud‑native services.

Metric Definition

Metric Format

<metric_name>{<label_name>=<label_value>, ...}

Metric names may contain ASCII letters, digits, underscores, and colons and must match

[a-zA-Z_:][a-zA-Z0-9_:]*

. Label names follow

[a-zA-Z_][a-zA-Z0-9_]*

.

Metric Types

Counter

Counters only increase (unless reset). Common examples are

http_requests_total

and

node_cpu

. Naming convention: suffix with

_total

.

rate(http_requests_total[5m])

Gauge

Gauges reflect current state and can increase or decrease, e.g.,

node_memory_MemFree

and

node_memory_MemAvailable

.

node_memory_MemFree

Summary

Summaries capture distribution statistics, useful for response‑time percentiles.

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.98e-05
go_gc_duration_seconds{quantile="0.25"} 5.31e-05
go_gc_duration_seconds{quantile="0.5"} 6.77e-05
go_gc_duration_seconds{quantile="0.75"} 0.0001428
go_gc_duration_seconds{quantile="1"} 0.0008099
go_gc_duration_seconds_sum 0.0114183
go_gc_duration_seconds_count 85

Histogram

Histograms record counts per bucket (

_count

) and total sum (

_sum

), enabling quantile calculations via

histogram_quantile()

.

# HELP prometheus_http_response_size_bytes Histogram of response size for HTTP requests.
# TYPE prometheus_http_response_size_bytes histogram
prometheus_http_response_size_bytes_bucket{handler="/",le="100"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="10000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="100000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1e+06"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="+Inf"} 1
prometheus_http_response_size_bytes_sum{handler="/"} 29
prometheus_http_response_size_bytes_count{handler="/"} 1

Application Metric Monitoring

Exposing Metrics

Prometheus pulls metrics via an HTTP endpoint, typically

/metrics

:

server := gin.New()
server.Use(middlewares.AccessLogger(), middlewares.Metric(), gin.Recovery())

server.GET("/health", func(c *gin.Context) {
    c.JSON(http.StatusOK, gin.H{"message": "ok"})
})

server.GET("/metrics", Monitor)
func Monitor(c *gin.Context) {
    h := promhttp.Handler()
    h.ServeHTTP(c.Writer, c.Request)
}

Defining Metrics

Example of three metric types for two business scenarios:

var (
    // HTTP request duration – Histogram
    HTTPReqDuration *prometheus.HistogramVec
    // HTTP request total – Counter
    HTTPReqTotal *prometheus.CounterVec
    // Running tasks – Gauge
    TaskRunning *prometheus.GaugeVec
)

func init() {
    HTTPReqDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
        Name: "http_request_duration_seconds",
        Help: "http request latencies in seconds",
    }, []string{"method", "path"})

    HTTPReqTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "total number of http requests",
    }, []string{"method", "path", "status"})

    TaskRunning = prometheus.NewGaugeVec(prometheus.GaugeOpts{
        Name: "task_running",
        Help: "current count of running task",
    }, []string{"type", "state"})

    prometheus.MustRegister(HTTPReqDuration, HTTPReqTotal, TaskRunning)
}

Generating Metrics

start := time.Now()
c.Next()

duration := float64(time.Since(start)) / float64(time.Second)
path := c.Request.URL.Path

controllers.HTTPReqTotal.With(prometheus.Labels{"method": c.Request.Method, "path": path, "status": strconv.Itoa(c.Writer.Status())}).Inc()
controllers.HTTPReqDuration.With(prometheus.Labels{"method": c.Request.Method, "path": path}).Observe(duration)

controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Inc()
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Dec()

Scraping Metrics

scrape_interval: 5s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'local-service'
    metrics_path: /metrics
    static_configs:
      - targets: ['host.docker.internal:8000']

In Kubernetes, static target configuration is often replaced by service discovery modes such as Node, Service, Pod, Endpoints, and Ingress.

Metric dashboards are shown below:

monitoringobservabilityKubernetesGometricsPrometheus
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.