Mastering Observability: A Deep Dive into OpenTelemetry’s Architecture
This article explains OpenTelemetry’s purpose, three‑layer architecture (instrumentation, collector, backend), practical Go instrumentation code, and how the collector processes and exports telemetry to both open‑source and SaaS backends, helping developers avoid vendor lock‑in and achieve unified observability.
What is OpenTelemetry?
OpenTelemetry (OTel) is a set of APIs, SDKs, and tools that standardize generation, collection, and export of telemetry data—traces, metrics, and logs. It is not a backend UI; it provides a language‑agnostic instrumentation layer that lets applications emit telemetry without being tied to a specific observability vendor.
Hosted by the CNCF and supported by major cloud providers, OTel aims for “instrument once, run everywhere”.
Three‑layer architecture
Layer 1 – Instrumentation
Instrumentation lives in application code. Official SDKs exist for Go, Java, Python, Node.js, and other languages.
Two instrumentation approaches are available:
Auto‑instrumentation : Import the language‑specific auto‑instrumentation package; it automatically wraps common libraries (HTTP servers, database drivers, gRPC clients) and creates spans without code changes.
Manual instrumentation : Use the OTel API directly to create spans, add attributes, and record events for custom logic.
Example: manual instrumentation in Go
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
// Global tracer for the service
var tracer = otel.Tracer("my-app/orders")
func ProcessOrder(ctx context.Context, orderID string) {
// Start a new Span named "ProcessOrder"
ctx, span := tracer.Start(ctx, "ProcessOrder")
defer span.End()
// Record the order identifier as an attribute
span.SetAttributes(attribute.String("order.id", orderID))
// ... business logic such as DB queries or RPC calls ...
}The call to tracer.Start creates a Span that records the operation name, start/end timestamps, and any attached attributes. Linked spans form a complete trace.
Layer 2 – OpenTelemetry Collector
The Collector is a high‑performance agent or gateway that receives telemetry from instrumented services, processes it, and forwards it to one or more backends.
Receivers : Accept data via OTLP (the native protocol) and other formats such as Jaeger, Prometheus, or Fluentd.
Processors :
Batch : Group data to reduce network overhead.
Attributes : Enrich telemetry with uniform metadata (e.g., pod name, host).
Filter : Drop low‑value data such as health‑check traces.
Sampler : Reduce trace volume under high load.
Redaction : Remove sensitive fields (passwords, PII) before export.
Exporters : Send processed data to destinations such as Jaeger (debugging), Prometheus, or commercial SaaS platforms (Datadog, New Relic, etc.).
Deploying the Collector decouples applications from backends; changing the backend only requires updating the Collector configuration.
Layer 3 – Backend
The backend stores, indexes, and visualizes telemetry.
Open‑source stack :
Jaeger / Zipkin – distributed tracing UI.
Prometheus – metrics storage and alerting.
Grafana – dashboards that can query Jaeger, Prometheus, Loki, etc.
Loki – log aggregation.
SaaS platforms : Datadog, New Relic, Honeycomb, Dynatrace, Splunk, and others provide managed analysis and AIOps features.
Self‑hosted storage : ClickHouse, Elasticsearch, or other high‑performance time‑series/columnar databases for enterprises with custom requirements.
Because the Collector can export to multiple backends, teams can combine solutions to match budget and performance needs.
Practical starter workflow
Add the auto‑instrumentation package for your primary service (e.g., go.opentelemetry.io/otel/sdk/trace for Go).
Configure a local Collector instance with an OTLP receiver and a Jaeger exporter.
Run the service and view the generated trace in the Jaeger UI to verify end‑to‑end visibility.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development & AI Practice
DevSecOps engineer sharing experiences and insights on AI, Web3, and Claude code development. Aims to help solve technical challenges, improve development efficiency, and grow through community interaction. Feel free to comment and discuss.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
