Managing LLM Traffic in Alibaba Service Mesh (ASM): Routing, Observability, and Security
This article explains how to use Alibaba Service Mesh (ASM) to register large language model (LLM) providers, configure LLMProvider and LLMRoute resources, and implement traffic routing, observability, and security for LLM services through step‑by‑step Kubernetes manifests and curl tests.
The rapid rise of large language models (LLMs) has transformed how applications obtain information and process text data. ASM treats HTTP as a first‑class citizen and extends the HTTP request protocol to support LLM‑specific parameters, enabling gray‑scale deployment, weighted routing, and rich observability.
The series introduces three perspectives—traffic routing, observability, and security—while this first part focuses on traffic routing capabilities.
Prerequisites
Cluster added to an ASM instance (v1.21.6.88 or later)
Sidecar injection enabled
Model service (DashScope) API key obtained
Moonshot API key obtained for the second example
Step 1: Create a test application (sleep)
Apply the following Kubernetes manifests using the ACK cluster kubeconfig:
apiVersion: v1
kind: ServiceAccount
metadata:
name: sleep
---
apiVersion: v1
kind: Service
metadata:
name: sleep
labels:
app: sleep
service: sleep
spec:
ports:
- port: 80
name: http
selector:
app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sleep
spec:
replicas: 1
selector:
matchLabels:
app: sleep
template:
metadata:
labels:
app: sleep
spec:
terminationGracePeriodSeconds: 0
serviceAccountName: sleep
containers:
- name: sleep
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
command: ["/bin/sleep", "infinity"]
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /etc/sleep/tls
name: secret-volume
volumes:
- name: secret-volume
secret:
secretName: sleep-secret
optional: true
---Step 2: Create an LLMProvider for DashScope
apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
name: dashscope-qwen
spec:
host: dashscope.aliyuncs.com
path: /compatible-mode/v1/chat/completions
configs:
defaultConfig:
openAIConfig:
model: qwen-1.8b-chat
stream: false
apiKey: ${DASHSCOPE_API_KEY}After creation, the sidecar rewrites HTTP requests from the sleep pod to the OpenAI‑compatible format, adds the API key, and upgrades to HTTPS.
Test the provider with:
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "请介绍你自己"}]
}'Step 3: Create an LLMRoute for subscriber users
apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
name: dashscope-route
spec:
host: dashscope.aliyuncs.com
rules:
- name: vip-route
matches:
- headers:
user-type:
exact: subscriber
backendRefs:
- providerHost: dashscope.aliyuncs.com
- backendRefs:
- providerHost: dashscope.aliyuncs.comUpdate the LLMProvider to add a route‑specific configuration for the subscriber route:
apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
name: dashscope-qwen
spec:
host: dashscope.aliyuncs.com
path: /compatible-mode/v1/chat/completions
configs:
defaultConfig:
openAIConfig:
model: qwen-1.8b-chat
stream: false
apiKey: ${DASHSCOPE_API_KEY}
routeSpecificConfigs:
vip-route:
openAIConfig:
model: qwen-turbo
stream: false
apiKey: ${DASHSCOPE_API_KEY}Testing with the subscriber header shows the qwen-turbo model is used.
Step 4: Weighted traffic splitting between providers
Define a generic service demo-llm-server and route 50% of traffic to DashScope and 50% to Moonshot:
apiVersion: v1
kind: Service
metadata:
name: demo-llm-server
namespace: default
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: none
type: ClusterIP apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
name: moonshot
spec:
host: api.moonshot.cn
path: /v1/chat/completions
configs:
defaultConfig:
openAIConfig:
model: moonshot-v1-8k
stream: false
apiKey: ${MOONSHOT_API_KEY} apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
name: demo-llm-server
namespace: default
spec:
host: demo-llm-server
rules:
- backendRefs:
- providerHost: dashscope.aliyuncs.com
weight: 50
- providerHost: api.moonshot.cn
weight: 50
name: migrate-ruleRepeated curl tests against demo-llm-server show roughly equal distribution of responses from both providers.
Conclusion
The article demonstrates how ASM can decouple applications from LLM providers using LLMProvider and LLMRoute resources, enabling dynamic routing, gray‑scale deployments, and per‑user model selection while preserving observability and security. These capabilities work on sidecars as well as on ASM ingress and egress gateways.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.