Operations 9 min read

Mastering Application Monitoring with Prometheus: Practical Metrics and Best Practices

This article explains how to design effective Prometheus metrics for various application types, covering golden metrics, label selection, naming conventions, bucket choices, and Grafana visualization tips to help engineers build reliable observability solutions.

Efficient Ops

Jul 3, 2023

Mastering Application Monitoring with Prometheus: Practical Metrics and Best Practices

In this article we introduce how to use Prometheus for application monitoring, summarizing practical metrics based on our experience and the official documentation.

Determine Monitoring Objects

Before designing metrics, clearly define what needs to be measured based on the problem context, requirements, and the system itself.

Golden Metrics

Google’s four golden metrics for large‑scale distributed monitoring are generally applicable:

Latency : the time taken to serve a request.

Traffic : the volume of traffic to assess service capacity.

Errors : the rate of error requests occurring in the system.

Saturation : the degree to which a critical resource (e.g., memory) limits the service.

These metrics address four monitoring needs:

Reflect user experience and core performance (e.g., request latency, job completion time).

Measure system throughput (e.g., request count, network packet size).

Help discover and locate faults (e.g., error count, failure rate).

Show system saturation and load (e.g., memory usage, queue length).

Additional custom metrics may be added for specific scenarios, such as measuring the latency and failure count of a frequently called library interface.

Choose Vector (Metric Group)

Select vectors based on differences in data type, resource type, or collection location, and ensure uniform units within each vector. Examples include request latency across different resources, regional server latency, or per‑HTTP‑status error counts.

The official documentation also recommends using separate metrics for different operations (e.g., Read vs. Write) rather than combining them.

Determine Labels

Common label choices include resource, region, and type. Labels should be additive and comparable; units must be consistent within a label dimension.

Avoid mixing summed and individual values in the same label, and instead aggregate totals with PromQL or separate metrics.

Naming Metrics and Labels

Good names convey meaning:

Use a pattern a-zA-Z with a domain prefix (e.g., prometheus_notifications_total).

Include a unit suffix (e.g., http_request_duration_seconds, node_memory_usage_bytes).

Prefer base units like seconds or bytes over milliseconds or megabytes.

Label names should reflect the chosen dimension, such as region: shenzhen, owner: user1, or stage: extract.

Bucket Selection for Histograms

Appropriate buckets improve percentile accuracy. Ideally, bucket counts are roughly equal across intervals. Use default buckets ( {0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2.5,5,10}) or exponential buckets for latency data, then adjust based on observed distribution.

Grafana Usage Tips

View All Dimensions

To discover available dimensions, query only the metric name without calculations and leave the legend format empty. This displays the raw metric data.

Scale Synchronization

In Grafana’s Settings panel, change the Graph Tooltip to Shared crosshair or Shared Tooltip to link scales across panels, making it easier to correlate two metrics.

my_metric{label=a} 1 my_metric{label=b} 6 my_metric{label=total} 7

These practices help build robust observability for various application types, from online services to batch jobs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Observability Metrics best practices Prometheus Grafana

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.