Operations 10 min read

Designing Effective Metrics: From Requirements to Labels and Buckets

This guide explains how to define, name, and organize monitoring metrics—covering Google’s four golden indicators, system‑specific measurement objects, vector selection, label conventions, bucket design, and practical Grafana tips—for reliable observability of diverse services.

Open Source Linux
Open Source Linux
Open Source Linux
Designing Effective Metrics: From Requirements to Labels and Buckets

Before designing metrics, clearly identify what needs to be measured based on the problem context, requirements, and the system being monitored.

From Requirements

Google’s experience with large‑scale distributed monitoring yields four golden metrics that are broadly applicable:

Latency: the time a service request takes.

Traffic: the current system’s load, used to gauge capacity needs.

Errors: the rate of error requests occurring in the system.

Saturation: the degree to which a critical resource (e.g., memory) limits the service.

These metrics satisfy four monitoring goals:

Reflect user experience and core performance (e.g., online latency, batch job completion time).

Reflect system throughput (e.g., request count, network packet volume).

Help discover and locate faults (e.g., error counts, failure rates).

Show system saturation and load (e.g., memory usage, queue length).

From the Monitored System

Different types of applications require different measurement objects. Official best‑practice documentation classifies applications into three categories:

Online‑serving systems: require immediate responses (e.g., web servers).

Offline processing systems: jobs run for a long time without the caller waiting (e.g., Spark).

Batch jobs: one‑off tasks that finish and exit (e.g., MapReduce data analysis).

Typical measurement objects per category are:

Online services: request count, error count, request latency.

Offline processing: job start time, number of active jobs, items emitted, queue length.

Batch jobs: completion timestamp, stage execution times, total duration, records processed.

In addition to the main system, sub‑systems may also be monitored:

Libraries: call count, successes, failures, latency.

Logging: count of log entries to determine frequency and timing.

Failures: error counts.

Thread pools: queued requests, active threads, total threads, latency, tasks in progress.

Caches: request count, hits, total latency.

Choosing Vectors

Guidelines for selecting a vector (a set of related metrics):

Data types are similar but resources, collection locations, etc., differ.

All data units within the vector are unified.

Examples include latency of different resource objects, latency across regions, or error counts for different HTTP request types.

Official documentation recommends using separate metrics for distinct operations (e.g., Read vs. Write) rather than combining them, and to differentiate actions with labels.

Determining Labels

Common label choices include:

resource

region

type

A key principle is that data for a given label dimension must be additive and comparable; units must be consistent. Avoid mixing partial and total counts in the same label (e.g., my_metric{label=a} 1, my_metric{label=total} 7). Use server‑side aggregation (PromQL) or separate metrics for totals.

Naming Metrics and Labels

Metric Naming

Follow the pattern

a-zA-Z*:*

.

Include a prefix indicating the domain, such as

prometheus_notifications_total

,

process_cpu_seconds_total

, or

ipamd_request_latency

.

Append a unit suffix to indicate the metric’s unit, e.g.,

http_request_duration_seconds

,

node_memory_usage_bytes

, or

http_requests_total

for a unit‑less counter.

Make the name logically reflect the measured variable.

Prefer base units (seconds, bytes) over derived ones (milliseconds, megabytes).

Label Naming

Label names should reflect the chosen dimension, for example:

region: shenzhen/guangzhou/beijing

owner: user1/user2/user3

stage: extract/transform/load

Bucket Selection

Appropriate histogram buckets improve percentile calculations. Ideal buckets produce roughly equal counts per bucket. Guidelines:

Know the approximate data distribution; if unknown, start with default buckets (

{0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2.5,5,10}

) or exponential buckets (

{1,2,4,8,…}

) and adjust after observing data.

Use narrower intervals where data is dense, wider intervals where it is sparse.

For latency data with long‑tail characteristics, exponential buckets are often suitable.

Initial bucket upper bounds should cover about 10% of data; if head data is not critical, a larger upper bound is acceptable.

To compute a specific percentile (e.g., 90%), add finer buckets around the 90% point.

In practice, I selected bucket ranges based on observed task durations, deployed them, and refined the buckets after monitoring the results.

Grafana Tips

Viewing All Dimensions

To discover additional grouping dimensions, query only the metric name without any calculations and leave the Legend format empty. This displays the raw metric data.

Ruler Linking

In the Settings panel, adjust the Graph Tooltip option (default is "Default"). Switching the graph display to "Shared crosshair" or "Shared Tooltip" enables ruler linking, making it easier to correlate two metrics during troubleshooting.

monitoringObservabilityMetricsPrometheusGrafanalabeling
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.