How to Monitor Business Metrics with Prometheus in Kubernetes
This article explains the concept of observability, details Prometheus metric definitions and types, and provides Go code examples for exposing, defining, generating, and scraping business‑level metrics in a Kubernetes‑based cloud‑native environment.
Prometheus Monitoring Business Metrics
With Kubernetes becoming the de‑facto container orchestration standard, microservice deployments are easy, but scaling them introduces service‑governance challenges, prompting the rise of observability.
Observability enables rapid fault localisation and even pre‑emptive detection of anomalies in distributed systems.
Observability
Observability is built on three pillars: logging, metrics, and tracing.
Logging records events generated during application execution, offering detailed state insight but consuming significant storage and query resources.
Metrics are aggregated numeric values that require minimal storage; they show system state and trends but lack fine‑grained detail, often enhanced with multidimensional structures.
Tracing follows request flows to pinpoint anomalies, sharing logging’s high resource cost and typically using sampling.
This article focuses on the metrics pillar within a Kubernetes‑based infrastructure, where Prometheus is the de‑facto monitoring solution for cloud‑native services.
Metric Definition
Metric Format
<metric_name>{<label_name>=<label_value>, ...}Metric names may contain ASCII letters, digits, underscores, and colons and must match
[a-zA-Z_:][a-zA-Z0-9_:]*. Label names follow
[a-zA-Z_][a-zA-Z0-9_]*.
Metric Types
Counter
Counters only increase (unless reset). Common examples are
http_requests_totaland
node_cpu. Naming convention: suffix with
_total.
rate(http_requests_total[5m])Gauge
Gauges reflect current state and can increase or decrease, e.g.,
node_memory_MemFreeand
node_memory_MemAvailable.
node_memory_MemFreeSummary
Summaries capture distribution statistics, useful for response‑time percentiles.
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.98e-05
go_gc_duration_seconds{quantile="0.25"} 5.31e-05
go_gc_duration_seconds{quantile="0.5"} 6.77e-05
go_gc_duration_seconds{quantile="0.75"} 0.0001428
go_gc_duration_seconds{quantile="1"} 0.0008099
go_gc_duration_seconds_sum 0.0114183
go_gc_duration_seconds_count 85Histogram
Histograms record counts per bucket (
_count) and total sum (
_sum), enabling quantile calculations via
histogram_quantile().
# HELP prometheus_http_response_size_bytes Histogram of response size for HTTP requests.
# TYPE prometheus_http_response_size_bytes histogram
prometheus_http_response_size_bytes_bucket{handler="/",le="100"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="10000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="100000"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="1e+06"} 1
prometheus_http_response_size_bytes_bucket{handler="/",le="+Inf"} 1
prometheus_http_response_size_bytes_sum{handler="/"} 29
prometheus_http_response_size_bytes_count{handler="/"} 1Application Metric Monitoring
Exposing Metrics
Prometheus pulls metrics via an HTTP endpoint, typically
/metrics:
server := gin.New()
server.Use(middlewares.AccessLogger(), middlewares.Metric(), gin.Recovery())
server.GET("/health", func(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{"message": "ok"})
})
server.GET("/metrics", Monitor)
func Monitor(c *gin.Context) {
h := promhttp.Handler()
h.ServeHTTP(c.Writer, c.Request)
}Defining Metrics
Example of three metric types for two business scenarios:
var (
// HTTP request duration – Histogram
HTTPReqDuration *prometheus.HistogramVec
// HTTP request total – Counter
HTTPReqTotal *prometheus.CounterVec
// Running tasks – Gauge
TaskRunning *prometheus.GaugeVec
)
func init() {
HTTPReqDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "http request latencies in seconds",
}, []string{"method", "path"})
HTTPReqTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "http_requests_total",
Help: "total number of http requests",
}, []string{"method", "path", "status"})
TaskRunning = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Name: "task_running",
Help: "current count of running task",
}, []string{"type", "state"})
prometheus.MustRegister(HTTPReqDuration, HTTPReqTotal, TaskRunning)
}Generating Metrics
start := time.Now()
c.Next()
duration := float64(time.Since(start)) / float64(time.Second)
path := c.Request.URL.Path
controllers.HTTPReqTotal.With(prometheus.Labels{"method": c.Request.Method, "path": path, "status": strconv.Itoa(c.Writer.Status())}).Inc()
controllers.HTTPReqDuration.With(prometheus.Labels{"method": c.Request.Method, "path": path}).Observe(duration)
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Inc()
controllers.TaskRunning.With(prometheus.Labels{"type": shuffle([]string{"video", "audio"}), "state": shuffle([]string{"process", "queue"})}).Dec()Scraping Metrics
scrape_interval: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'local-service'
metrics_path: /metrics
static_configs:
- targets: ['host.docker.internal:8000']In Kubernetes, static target configuration is often replaced by service discovery modes such as Node, Service, Pod, Endpoints, and Ingress.
Metric dashboards are shown below:
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.