How to Monitor Business Metrics with Prometheus in Kubernetes
This article explains how to use Prometheus to monitor business‑level metrics in a Kubernetes environment, covering observability fundamentals, metric definitions, metric types, exposing metrics via a /metrics endpoint, and practical Go code examples for defining, recording, and scraping custom metrics.
Prometheus Monitoring Business Metrics
With Kubernetes becoming the de‑facto container orchestration standard, microservice deployments are easy, but scaling them introduces service‑governance challenges, leading to the concept of observability.
Observability
Observability consists of three pillars: logging, metrics, and tracing.
Logging records events generated during application execution, providing detailed state information but consuming significant storage and query resources.
Metrics are aggregated numeric values that require little storage and reveal system state and trends, though they lack detailed context.
Tracing follows request flows to pinpoint anomalies, but like logging it can be resource‑intensive and often uses sampling.
This article focuses on the metrics pillar, using Prometheus as the standard monitoring system for cloud‑native services.
Metric Definition
Metrics are expressed as
<metric_name>{<label_name>=<label_value>, ...}. Metric names may contain ASCII letters, digits, underscores, and colons and must match
[a-zA-Z_:][a-zA-Z0-9_:]*. Label names may contain ASCII letters, digits, and underscores and must match
[a-zA-Z_][a-zA-Z0-9_]*.
Metric Types
Counter
Counters only increase (except on reset). Examples:
http_requests_total,
node_cpu. By convention, counter names end with
_total.
Gauge
Gauges represent the current value of a metric and can go up or down. Examples:
node_memory_MemFree,
node_memory_MemAvailable.
Summary
Summaries record statistical distributions, such as response‑time percentiles, which are useful when averages hide outliers.
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.98e-05
go_gc_duration_seconds{quantile="0.25"} 5.31e-05
go_gc_duration_seconds{quantile="0.5"} 6.77e-05
go_gc_duration_seconds{quantile="0.75"} 0.0001428
go_gc_duration_seconds{quantile="1"} 0.0008099
go_gc_duration_seconds_sum 0.0114183
go_gc_duration_seconds_count 85Histogram
Histograms also record distributions, exposing bucket counts (
_count) and sums (
_sum) and allow calculation of quantiles via
histogram_quantile().
# HELP prometheus_http_response_size_bytes Histogram of response size for HTTP requests.
# TYPE prometheus_http_response_size_bytes histogram
prometheus_http_response_size_bytes_bucket{handler="/",le="100"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="10000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="100000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1e+06"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="+Inf"} 1
prometheus_http_response_size_bytes_sum{handler="/"} 29
prometheus_http_response_size_bytes_count{handler="/"} 1Exposing Metrics
Prometheus typically scrapes metrics via an HTTP
/metricsendpoint. Example using Gin:
server := gin.New()
server.Use(middlewares.AccessLogger(), middlewares.Metric(), gin.Recovery())
server.GET("/health", func(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{"message": "ok"})
})
server.GET("/metrics", Monitor)
func Monitor(c *gin.Context) {
h := promhttp.Handler()
h.ServeHTTP(c.Writer, c.Request)
}Defining Custom Metrics
Three example metrics are defined for two business scenarios:
var (
// HTTP request duration (Histogram)
HTTPReqDuration *prometheus.HistogramVec
// HTTP request total (Counter)
HTTPReqTotal *prometheus.CounterVec
// Running tasks (Gauge)
TaskRunning *prometheus.GaugeVec
)
func init() {
HTTPReqDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{Name: "http_request_duration_seconds", Help: "http request latencies in seconds"}, []string{"method", "path"})
HTTPReqTotal = prometheus.NewCounterVec(prometheus.CounterOpts{Name: "http_requests_total", Help: "total number of http requests"}, []string{"method", "path", "status"})
TaskRunning = prometheus.NewGaugeVec(prometheus.GaugeOpts{Name: "task_running", Help: "current count of running task"}, []string{"type", "state"})
prometheus.MustRegister(HTTPReqDuration, HTTPReqTotal, TaskRunning)
}During request handling, metrics are recorded:
start := time.Now()
c.Next()
duration := float64(time.Since(start)) / float64(time.Second)
path := c.Request.URL.Path
controllers.HTTPReqTotal.With(prometheus.Labels{"method": c.Request.Method, "path": path, "status": strconv.Itoa(c.Writer.Status())}).Inc()
controllers.HTTPReqDuration.With(prometheus.Labels{"method": c.Request.Method, "path": path}).Observe(duration)
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Inc()
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Dec()Scraping Configuration
Example Prometheus scrape config for the local service:
scrape_interval: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'local-service'
metrics_path: /metrics
static_configs:
- targets: ['host.docker.internal:8000']In Kubernetes, static target lists are rarely used; instead, Prometheus integrates with the Kubernetes API and supports service discovery modes such as Node, Service, Pod, Endpoints, and Ingress.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.