Understanding Prometheus Data Collection: Formats, Types, and Best Practices
This article explains Prometheus data collection by describing metric syntax, label usage, time‑series concepts, the four logical metric types (Counter, Gauge, Histogram, Summary), and provides practical naming, labeling, and selection guidelines for effective monitoring.
This article is the second part of a series on Prometheus, following an introductory piece on its architecture, and focuses on how Prometheus collects data.
1. Data Format and Classification
1.1 Data Format
Prometheus represents a monitoring metric with a <metricname>{<label name="label value">, ...} syntax. The metric name must consist of letters, digits, underscores, or a colon (the colon is reserved for recording rules). For example, http_requests_total denotes the total number of HTTP requests. Labels add dimensionality, such as http_requests_total{method="POST", status="200"} , allowing easy filtering and aggregation. A series of values over time is called a time series , and each point (a sample ) consists of a float64 value and a millisecond‑precision timestamp.
1.2 Data Classification
Prometheus logically classifies metrics into four types: Counter , Gauge , Histogram , and Summary . Internally, the system treats all collected data as untyped, so this classification is only for user convenience.
Counter
Counters are monotonic increasing values suitable for totals such as request counts, completed tasks, or error occurrences. They do not reset to zero on service restarts.
Gauge
Gauges represent values that can increase or decrease, like CPU usage, memory consumption, or I/O size.
Histogram
Histograms are cumulative bucketed distributions used to capture long‑tail effects. For example, response‑time buckets [30ms, 100ms, 300ms, 1s, 3s, 5s, 10s] can be used to build a cumulative histogram of API latency.
Summary
Summaries also record quantiles but compute them on the client side, offering more accurate percentile estimates at the cost of higher resource usage and the inability to derive averages or combine with other metrics.
2. Usage Recommendations
2.1 Metric Naming
Start the metric name with its domain, e.g., process_cpu_seconds_total for process‑related CPU time.
Use descriptive plural units; append total for counters when appropriate.
2.2 Label Selection
Labels should describe typical characteristics of the metric, such as operation="create|update|delete" . Avoid high‑cardinality values like user IDs or email addresses, and keep the number of labels per metric under ten.
2.3 Choosing Between Histogram and Summary
Use Histogram when you need aggregation functions.
Use Summary when you have an expected distribution and require precise quantiles; otherwise prefer Histogram.
2.4 What to Monitor
All service types: online services, offline services, and batch jobs.
Key logic of each service: total executions, failure counts, retry counts.
Service quality metrics: request totals, error rates, response times.
System resources: utilization, saturation, and error counts.
Further Reading: recording rules https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules
Feel free to leave comments or questions about related topics.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.