Cloud Native 17 min read

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

The article explains how observability—through logs, metrics, and traces—transforms the opaque, complex day‑2 operations of micro‑service, Kubernetes, and serverless environments into a deterministic, diagnosable system, highlighting OpenTelemetry, practical collection methods, and real‑world implementation challenges and benefits.

Tencent Cloud Middleware

Dec 9, 2021

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

What Observability Should Do

Observability aims to make a system’s internal state transparent, much like medical imaging lets doctors diagnose patients, by providing fine‑grained data such as logs, metrics, and request traces that reveal topology, performance bottlenecks, and failures.

Day‑2 Focus: Observability in Cloud‑Native and Serverless

While developers enjoy creative Day‑0/Day‑1 work, Day‑2—deployment, monitoring, maintenance, and iteration—often receives less attention. The article argues that robust observability is essential in this phase, especially for micro‑service architectures that may involve dozens or hundreds of services.

Foundations of Observability

Originating from Google’s Dapper paper, observability relies on three telemetry types:

Logs : Carry complete contextual information but can be costly to transmit and store.

Metrics : Provide abstracted statistical data with relatively fixed overhead, suitable for monitoring and alerting.

Traces : Describe request‑level topology across services; per‑request collection can be expensive.

OpenTelemetry Overview

OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

Unlike backend solutions such as Jaeger or Prometheus, OpenTelemetry defines standard data formats and provides pluggable exporters, but does not include storage, query, or visualization components.

Observability in Kubernetes

Kubernetes components are distributed and declaratively managed, making observability more challenging than in VM environments. Effective observability must gather data from both the application layer and the control‑plane components.

Logs : Projects like Fluentd or Logstash run as DaemonSets on each node; they can forward logs to back‑ends such as ElasticStack.

Metrics : Kubernetes exposes three APIs— metrics.k8s.io, custom.metrics.k8s.io, and external.metrics.k8s.io. The Metrics Server implements the core API, while Prometheus Adapter supports the custom and external APIs, enabling autoscaling based on these metrics.

Traces : Service‑mesh solutions (e.g., Istio) can collect traces without instrumentation overhead; for languages like Java, agents can emit OpenTelemetry‑compatible traces.

Observability in Serverless

Serverless abstracts away infrastructure, which paradoxically reduces the visibility needed for troubleshooting. Nevertheless, observability still provides value by exposing topology, request context, performance bottlenecks, and optimization opportunities.

Collected telemetry can feed predictive autoscaling (HPA) or AIOps use‑cases, reducing cold‑start latency and improving reliability.

Practical Implementation: Tencent Cloud TEM

TEM (Tencent Cloud Serverless Platform) demonstrates concrete observability practices:

Image Build Observability

During container image construction, TEM records each step’s success and duration, enabling developers to pinpoint slow stages.

#5 [1/9] FROM ccr.ccs.tencentyun.com/tsf_build/tem-buildkit-war-open-base:8.5-jre8@sha256:…
#5 resolve ccr.ccs.tencentyun.com/tsf_build/tem-buildkit-war-open-base:8.5-jre8 done
#5 DONE 0.0s
#15 importing cache manifest …
#15 DONE 0.8s
…
#19 [auth] tem-100011913960-dsxh/svc-test-war-firstdeploy-kgqkyiqs:pull,push token for ccr.ccs.tencentyun.com
#19 DONE 0.0s
#16 exporting to image
#16 pushing layers 5.5s done
#16 pushing manifest … 1.4s done
#16 DONE 7.1s

Application Deployment Observability

TEM surfaces native Kubernetes logs and its own scheduling information, helping users diagnose issues such as missing images, quota limits, or invalid parameters.

Canary Release : Small batch validation of a new version.

Batch Release : Rolling updates with optional manual or automatic triggers.

In‑Place Upgrade : Rolling updates that preserve instance IDs and IPs.

Integrated Cloud‑Product Observability

TEM connects with other Tencent Cloud services to provide a unified observability stack:

Logs : Tencent Cloud CLS offers a one‑stop log collection, storage, and analysis solution.

Metrics : Integrated Cloud Monitoring and APM deliver comprehensive metrics, including JVM and request‑level data.

Trace : Java‑agent based, non‑intrusive tracing presents full request lifecycles for root‑cause analysis.

Conclusion

Micro‑service, container, and cloud‑native technologies bring powerful capabilities but also increase system complexity. Focusing on Day‑2 observability—collecting, standardizing, and visualizing logs, metrics, and traces—enables reliable operation, faster debugging, and better resource utilization in both Kubernetes and serverless environments.

References

https://www.infoq.cn/news/2017/11/observability-monitoring/

https://copyconstruct.medium.com/monitoring-in-the-time-of-cloud-native-c87c7a5bfa3e

https://www.observeinc.com/resources/observability-in-kubernetes/

https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/

https://lumigo.io/blog/understanding-serverless-observability/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

serverless cloud-native Observability OpenTelemetry day2

Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.