How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)
This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.
Background
Effective observability is essential for building efficient, stable distributed applications, and it becomes even more critical for LLM‑driven services. Over time, observability logic has moved from manual code instrumentation to framework‑level support and finally to the infrastructure layer provided by service meshes.
ASM‑Based LLM Observability
Alibaba Cloud Service Mesh (ASM) now offers infrastructure‑level LLM traffic management and observability without requiring a specific language SDK or changes to application call patterns. By configuring ASM, users can obtain transparent traffic routing and detailed observability data, which is crucial for both service stability and cost optimization.
Observability Features Provided by ASM
ASM’s observability consists of three parts: access logs, monitoring metrics, and tracing. The default log and metric capabilities do not expose LLM‑specific information (e.g., model name, token counts). ASM therefore enhances these two areas.
1. Enhanced Access Logs
ASM allows custom access‑log formats that can include the following fields:
request_model FILTER_STATE(wasm.asm.llmproxy.request_model:PLAIN)
request_prompt_tokens FILTER_STATE(wasm.asm.llmproxy.request_prompt_tokens:PLAIN)
request_completion_tokens FILTER_STATE(wasm.asm.llmproxy.request_completion_tokens:PLAIN)These fields represent the model used for the request, the number of input (prompt) tokens, and the number of output (completion) tokens. Example log entries after formatting:
{
"duration": "7640",
"response_code": "200",
"authority_for": "dashscope.aliyuncs.com",
"request_model": "qwen-1.8b-chat",
"request_prompt_tokens": "3",
"request_completion_tokens": "55"
} {
"duration": "2759",
"response_code": "200",
"authority_for": "dashscope.aliyuncs.com",
"request_model": "qwen-turbo",
"request_prompt_tokens": "11",
"request_completion_tokens": "90"
}These logs can be collected by Alibaba Cloud Log Service for alerting and dashboarding.
2. New Prometheus Metrics for Token Consumption
ASM adds two metrics: asm_llm_proxy_prompt_tokens: number of input tokens. asm_llm_proxy_completion_tokens: number of output tokens.
Both metrics carry four default dimensions:
llmproxy_source_workload llmproxy_source_workload_namespace llmproxy_destination_service llmproxy_modelTo enable them, create a ConfigMap that defines the tag extraction rules and patch the workload to use the ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: asm-llm-proxy-bootstrap-config
data:
custom_bootstrap.json: |
"stats_config": {
"stats_tags":[
{"tag_name":"llmproxy_source_workload","regex":"(\\|llmproxy_source_workload=([^|]*))"},
{"tag_name":"llmproxy_source_workload_namespace","regex":"(\\|llmproxy_source_workload_namespace=([^|]*))"},
{"tag_name":"llmproxy_destination_service","regex":"(\\|llmproxy_destination_service=([^|]*))"},
{"tag_name":"llmproxy_model","regex":"(\\|llmproxy_model=([^|]*))"}
]
}Apply the ConfigMap and patch the deployment:
kubectl apply -f asm-llm-proxy-bootstrap-config.yaml
kubectl patch deployment sleep -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/bootstrapOverride":"asm-llm-proxy-bootstrap-config"}}}}}'Query the metrics from the sidecar:
kubectl exec deployments/sleep -it -c istio-proxy -- curl localhost:15090/stats/prometheus | grep llmproxySample output shows token counts per model:
asm_llm_proxy_prompt_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-1.8b-chat"} 3
asm_llm_proxy_completion_tokens{llmproxy_source_workload="sleep",llmproxy_source_workload_namespace="default",llmproxy_destination_service="dashscope.aliyuncs.com",llmproxy_model="qwen-turbo"} 853. Adding Model Dimension to Native Istio Metrics
Native Istio metrics (e.g., istio_requests_total) lack LLM‑specific dimensions. ASM allows adding a custom dimension “model” to any metric. Using the REQUEST_COUNT metric as an example, the UI steps are:
Open the observability configuration page.
Select the metric (REQUEST_COUNT) and edit its dimensions.
Add a new dimension named model with the value source filter_state["wasm.asm.llmproxy.request_model"].
After applying the change, the model appears in the Istio request metric:
istio_requests_total{... ,model="qwen-1.8b-chat"} 1
istio_requests_total{... ,model="qwen-turbo"} 1This enables analysis such as model‑wise success rates or average latency.
Testing Commands
Send LLM requests via the sidecar:
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--data '{"messages":[{"role":"user","content":"Please introduce yourself"}]}'
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--header 'user-type: subscriber' \
--data '{"messages":[{"role":"user","content":"Please introduce yourself"}]}'View the last two lines of the access log:
kubectl logs deployments/sleep -c istio-proxy | tail -2Query Prometheus metrics for token usage or request counts as shown above.
Conclusion
The article demonstrates how ASM extends its existing HTTP/TCP observability stack with LLM‑specific logs and metrics, providing fine‑grained insight into model usage and token consumption. These capabilities form the foundation for more advanced features such as LLM request caching and token‑based rate limiting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
