Operations 9 min read

Why Solid Monitoring Must Come Before Observability Projects (And How to Build It)

Before launching costly observability initiatives, ensure your monitoring is comprehensive and efficient, covering business, application, component, resource, network, and endpoint metrics, and that you have the data collection, storage, alerting, and event‑distribution capabilities to turn raw signals into actionable insights.

ITPUB

Aug 8, 2024

Why Solid Monitoring Must Come Before Observability Projects (And How to Build It)

Many companies rush into observability projects without first establishing solid monitoring, leading to poor results and low business acceptance; the article advises verifying that monitoring is complete and offers a higher ROI before expanding to full observability.

Coverage Completeness

Monitoring should be divided into several categories, each requiring specific metrics and alerts.

Business Monitoring

Track business‑level indicators such as order volume; a sudden drop signals a problem that senior leadership will notice. These metrics often reside in relational or analytical databases, so the alert engine must be able to query OLTP/OLAP sources.

Application Monitoring

For web or RPC services, follow Google’s SRE metrics (Request, Error, Duration) and add Saturation (resource usage) to form the RED‑S model, which helps identify overload and capacity needs.

Component Monitoring

Monitor middleware, databases, distributed storage, and Kubernetes, as their health directly impacts applications. Understanding each component’s internals is essential; for example, MySQL health can be examined with show global status and other status commands.

Resource Monitoring

Observe runtime environments—physical machines, VMs, containers—by tracking CPU, memory, disk, network, and niche metrics like NTP, conntrack, or vmstat.

Network Monitoring

Cover network devices, links, and external egress. Use tools such as pingmesh or eBPF to collect connectivity and quality data; internet‑facing services also need outbound and regional probing.

Endpoint Monitoring

Collect client‑side data from apps, web pages, H5, or mini‑programs via instrumentation or SDKs, measuring page load time, interaction latency, and error rates.

Capability Completeness

Building a complete monitoring solution involves several technical layers.

Data Collection

Use agents and exporters such as Telegraf, Categraf, Grafana‑agent, Datadog‑agent, Filebeat, Fluentbit, or iLogtail to gather metrics and logs. Existing data in MySQL, Oracle, ClickHouse, or Postgres can be queried directly by the alert engine.

Data Storage

For metrics, VictoriaMetrics is recommended (Prometheus is also viable but single‑node). For logs, Elasticsearch is the default choice; large volumes may require ClickHouse, while cost‑sensitive setups can use Loki or OpenObserve with S3 back‑ends.

Alert Engine

Open‑source options include Grafana (strong visualization), Nightingale (good Prometheus compatibility), ElastAlert for Elasticsearch, and Clickvisual for ClickHouse alerts.

Event Distribution

After alerts fire, handling steps such as deduplication, noise reduction, routing, on‑call scheduling, and escalation often rely on tools like PagerDuty or Opsgenie.

Next Steps

Fill any missing data sources to achieve full coverage.

Integrate and correlate data across the monitoring stack.

Organize data per scenario to turn raw metrics into actionable insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Observability Alerting

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.