Operations 10 min read

How to Build Effective Monitoring for Microservices: Logs, Tracing, and Metrics Explained

This article explains the three main monitoring approaches—log collection, distributed tracing, and metric gathering—in microservice architectures, outlines the layered monitoring model, lists key system, application, and user metrics, and reviews popular open‑source time‑series monitoring tools such as Prometheus, OpenTSDB, and InfluxDB.

dbaplus Community

Sep 16, 2019

How to Build Effective Monitoring for Microservices: Logs, Tracing, and Metrics Explained

Monitoring in Microservice Architecture

In a microservice system a single user request traverses multiple services. When an error occurs the failing service and the associated metric must be identified, which requires comprehensive monitoring of each service and its key indicators.

Monitoring Categories

Log monitoring (unstructured event records)

Distributed tracing (call‑chain tracking)

Metrics monitoring (numeric time‑series data)

Log Monitoring

Application code, runtime frameworks and business logic emit log entries that are typically collected centrally for later search and analysis. A common implementation is the ELK stack (Elasticsearch + Logstash + Kibana). Optional Beats agents run on each host to ship raw log files to Logstash, where they are parsed, filtered and enriched before being indexed in Elasticsearch. Kibana provides visual exploration of the indexed logs.

Typical data flow: Beats → Logstash → Elasticsearch → Kibana Both the basic stack and extended variants (e.g., adding additional processing pipelines) are widely used for log‑based monitoring and debugging.

Distributed Tracing

Tracing records the complete lifecycle of a request as it propagates through multiple services, enabling pinpointing of failures or performance bottlenecks. Tools such as CAT (Common Application Tracing) are often adopted in medium‑to‑large projects, though they require additional instrumentation and infrastructure.

A simple fault‑tolerance pattern is to set an active timeout on inter‑service calls: if the downstream service does not respond within the configured threshold, the caller aborts the request to avoid cascading delays.

Metrics Monitoring

Metrics are stored in time‑series databases (TSDB) as numeric values associated with timestamps. They support aggregation, trend analysis and are the primary source for alerting. Five fundamental metric types are commonly used:

Gauges – instantaneous values

Counters – monotonically increasing counts

Histograms – distribution of observed values

Meters – rate calculations (e.g., transactions per second)

Timers – duration measurements

Monitoring Layers and Core Indicators

Monitoring is usually organized into three layers:

System layer – CPU, disk, memory, network (operations focus)

Application layer – service health, API status, internal error codes (development focus)

User layer – business‑level metrics such as conversion rate or revenue (product focus)

Typical key indicators across these layers include:

Latency – e.g., average HTTP response time of 100 ms

Request volume – throughput such as QPS (queries per second)

Error rate – proportion of failed calls over a time window

Open‑Source Time‑Series Monitoring Solutions

Prometheus

Released in 2012, Prometheus is an open‑source monitoring framework built around a TSDB. It primarily uses a pull model: Prometheus server scrapes metrics from instrumented applications or from exporters. For workloads that cannot be scraped (e.g., batch jobs), a Pushgateway can be used to receive pushed metrics, which Prometheus then pulls.

Configuration can be static or driven by service‑discovery mechanisms (Kubernetes, Consul, etc.). Core components:

PromQL – a flexible query language for selecting and aggregating time‑series data

Alertmanager – handles alert routing, silencing and notification (email, Slack, webhook, etc.)

Web UI – basic graphing; most users pair Prometheus with Grafana for advanced dashboards

OpenTSDB

OpenTSDB, launched in 2010, is a distributed TSDB that stores metrics in HBase. It follows a push model: agents or applications push metric points to OpenTSDB’s HTTP API. The system provides a built‑in Web UI and integrates smoothly with Grafana for visualization. OpenTSDB does not include a native alerting component, so external alerting solutions must be added.

InfluxDB

InfluxDB, open‑sourced in 2013, is another TSDB that accepts metrics via a push API (line protocol). It includes a Web UI for query and exploration and can be visualized with Grafana. Like OpenTSDB, InfluxDB provides basic alerting rules but many deployments rely on external alert managers for production‑grade notifications.

In summary, effective monitoring of microservice systems combines log collection, distributed tracing, and time‑series metric gathering. The three‑layer monitoring model (system, application, user) guides indicator selection, while mature open‑source TSDB solutions such as Prometheus, OpenTSDB and InfluxDB provide the foundation for scalable metric storage, querying and alerting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Microservices Observability Metrics prometheus Tracing

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.