Cloud Native 18 min read

Mastering Observability in Alibaba Cloud Service Mesh ASM: Logs, Metrics, and Tracing

This guide explains how Alibaba Cloud Service Mesh ASM enables comprehensive observability for cloud‑native applications by configuring telemetry for logs, metrics, and distributed tracing, offering best‑practice recommendations, YAML examples, and integration with Prometheus, ARMS, and external tracing tools.

Alibaba Cloud Native

Jul 28, 2023

Mastering Observability in Alibaba Cloud Service Mesh ASM: Logs, Metrics, and Tracing

Observability Overview

As application systems become increasingly complex, maintaining steady, robust operation is challenging; parts of a system may degrade. Observability—understanding internal state by examining external outputs—helps detect failures, reduce mean‑time‑to‑recovery, and keep services reliable.

ASM’s Unified Observability Model

Alibaba Cloud Service Mesh (ASM) provides a standardized way to generate and collect converged observability data, supporting cloud‑native applications. It leverages the Service Mesh data‑plane proxy (Envoy) to capture logs, metrics, and traces.

Built‑in Best Practices for Telemetry CRD

Only one Telemetry object named default is allowed in the istio-system namespace.

Each namespace may have a single Telemetry with an empty selector and name default.

Use workload selectors to create workload‑specific overrides.

If two Telemetry objects select the same workload, execution order is undefined.

When the global Telemetry lacks metric configuration, metrics are disabled by default.

Initially ASM enables only SERVER‑side metrics to limit storage cost; CLIENT‑side metrics must be enabled manually if needed.

Log Collection

Aggregating service logs to a central system simplifies management and search. ASM offers log filtering and formatting. Services should write logs to stdout / stderr, and a log agent forwards them.

Log format rule (example) :

envoyFileAccessLog:</code>
<code>  logFormat:</code>
<code>    text: '{"bytes_received":"%BYTES_RECEIVED%","bytes_sent":"%BYTES_SENT%","duration":"%DURATION%","method":"%REQ(:METHOD)%","path":"%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%","response_code":"%RESPONSE_CODE%","user_agent":"%REQ(USER-AGENT)%","x_forwarded_for":"%REQ(X-FORWARDED-FOR)%","app_service_name":"%UPSTREAM_CLUSTER%"}'</code>
<code>  path: /dev/stdout

Log filtering example (only log responses with status code ≥ 400):

accessLogging:</code>
<code>- disabled: false</code>
<code>  filter:</code>
<code>    expression: response.code >= 400</code>
<code>  providers:</code>
<code>  - name: envoy

Control‑Plane Log and Alerting

ASM can collect control‑plane logs (e.g., configuration push failures) and generate alerts. Enabling these alerts helps detect misconfigurations that could render sidecar proxies or gateways unavailable after a restart.

Metrics Collection

Metrics are a core observability dimension. Istio uses Prometheus; each Envoy sidecar emits numerous metrics. ASM provides a UI to define metric generation rules via the Telemetry CRD.

Example metric overrides (enabling SERVER‑side metrics, disabling CLIENT‑side by default):

metrics:</code>
<code>- overrides:</code>
<code>  - disabled: true</code>
<code>    match:</code>
<code>      metric: ALL_METRICS</code>
<code>      mode: CLIENT</code>
<code>  - disabled: false</code>
<code>    match:</code>
<code>      metric: ALL_METRICS</code>
<code>      mode: SERVER</code>
<code>providers:</code>
<code>- name: prometheus

First‑time activation only enables SERVER‑side metrics to control cost; users can enable CLIENT‑side metrics as needed. Certain SERVER‑side metrics (e.g., REQUEST_COUNT, TCP_SENT_BYTES) are required for mesh topology visualisation.

Service Level Objectives (SLO) and Indicators (SLI)

ASM supports SLO definition through a UI that auto‑generates Prometheus rules. Typical SLI types include availability (HTTP 2xx/3xx) and latency (custom thresholds). Example SLOs:

Average QPS > 100k per minute

99% of requests latency < 500 ms

99% of bandwidth > 200 MB/s

Generated alerts appear in Alertmanager when SLO thresholds are breached.

Distributed Tracing

Tracing provides end‑to‑end visibility of request flows. ASM integrates with Jaeger or Zipkin via the Telemetry CRD. Users must propagate tracing headers (e.g., x-request-id, x-b3-traceid, x-b3-spanid, etc.) so that sidecars can associate spans correctly.

Example tracing configuration:

tracing:</code>
<code>- customTags:</code>
<code>    mytag1:</code>
<code>      literal:</code>
<code>        value: fixedvalue</code>
<code>    mytag2:</code>
<code>      header:</code>
<code>        name: myheader1</code>
<code>        defaultValue: value1</code>
<code>    mytag3:</code>
<code>      environment:</code>
<code>        name: myenv1</code>
<code>        defaultValue: value1</code>
<code>  providers:</code>
<code>  - name: zipkin</code>
<code>  randomSamplingPercentage: 90

Collected trace data can be sent to Alibaba Cloud’s managed tracing service or to self‑hosted solutions like Zipkin or Jaeger.

Mesh Topology Visualization

ASM includes a built‑in mesh topology view that visualises services and their configurations, relying on the metrics described above.

Conclusion

Observability is essential for cloud‑native applications. By using ASM’s unified telemetry model—covering logs, metrics, and tracing—teams can achieve non‑intrusive, standardized data collection, reduce operational overhead, and improve reliability and performance of their service mesh deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Logging Service Mesh ASM

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.