Operations 9 min read

How Observability Redefines Modern Monitoring: Metrics, Logs, Tracing, Events

Modern monitoring has evolved into comprehensive observability, encompassing metrics, logging, tracing, and events, and requires specialized storage solutions for each data type; this article explores the origins, key concepts, and design considerations for building effective observability systems in today's complex internet engineering landscape.

Efficient Ops
Efficient Ops
Efficient Ops
How Observability Redefines Modern Monitoring: Metrics, Logs, Tracing, Events

For DevOps, monitoring is a crucial component; previously we discussed large‑scale enterprise monitoring system design. This article examines monitoring from the perspective of system observability.

Monitoring and observability are engineering terms without strict definitions; they both aim to enhance engineers' understanding of online system behavior.

The purpose of monitoring/observability engineering is to help engineers discover, locate, and resolve problems, achieved through data collection, storage, analysis, and iteration.

1. Origin of Monitoring Needs

When software is delivered and deployed to production, the longest phase of its lifecycle begins, creating a natural need to know whether the production environment is operating normally.

Numerous monitoring tools have emerged, such as Ganglia, Zabbix, RRDTools, Graphite, each providing answers at different layers.

Monitoring evolves in two directions:

From black‑box to white‑box

From resource‑level to business‑level

Etsy publicly shared its monitoring practice in 2011, using the open‑source StatsD to collect, store, and analyze data uniformly across resources and business layers. Subsequent metrics‑based monitoring systems were heavily inspired by StatsD.

2. Emergence of Observability

Twitter was among the first to introduce the term observability, publishing a series describing its stack, including Zipkin and an open‑source implementation of Google Dapper.

Observability expands the data focus beyond metrics to include logging, tracing, and events.

metrics

logging

tracing

events

A modern monitoring/observability system must be capable of storing all these data types properly.

3. Storage

Metrics

Metrics are numeric time‑series data, leading to the creation of time‑series databases (TSDB). TSDB evolution includes:

Data model: separating tags from metric names.

Data types: gauges, counters, timers, etc.

Index structures: tag‑centric inverted indexes are now mainstream.

Storage techniques: from RRDTool ring buffers to OpenTSDB encoding and Facebook’s compression algorithms, often combining multiple technologies.

Further details on TSDB research are covered in separate articles.

Logging

Logging is the most direct way for engineers to diagnose production issues. Its evolution includes:

Centralized storage and retrieval, enabling unified collection, storage, and search.

Log monitoring.

Keyword detection (e.g., error, fatal) to highlight important failures.

Deriving metrics from logs, such as extracting useful information from access logs.

Tracing

Distributed tracing has become essential for understanding and troubleshooting complex microservice architectures.

Open‑source back‑ends like OpenZipkin and CNCF Jaeger provide ready‑to‑use storage solutions.

Key considerations for tracing storage:

Sparse data: trace paths vary across business flows, leading to sparsity.

Multidimensional queries: solutions often use secondary indexes (e.g., HBase, Cassandra) and may add inverted indexes or Elasticsearch for flexibility.

Events

Events represent changes such as deployments, configuration updates, or DNS switches, often triggering failures.

Event handling focuses on:

Centralized storage : due to diverse event types, inverted indexes suit the unpredictable query dimensions.

Dashboard : visualizing event queries, as demonstrated by Etsy’s practice linking login failures to deployment events.

Conclusion

Modern monitoring or observability engineering involves collecting, storing, and analyzing various data types, each with distinct characteristics; there is no one‑size‑fits‑all storage solution. Typically, independent storage designs are built for each type and presented through a unified system.

Source: Adapted from JD Cloud Developer public account.

MonitoringobservabilitymetricsLoggingTracingevents
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.