Operations 13 min read

Managing LLM Traffic in Alibaba Service Mesh (ASM): Routing, Observability, and Security

This article explains how to use Alibaba Service Mesh (ASM) to register large language model (LLM) providers, configure LLMProvider and LLMRoute resources, and implement traffic routing, observability, and security for LLM services through step‑by‑step Kubernetes manifests and curl tests.

Alibaba Cloud Infrastructure

Jul 15, 2024

Managing LLM Traffic in Alibaba Service Mesh (ASM): Routing, Observability, and Security

The rapid rise of large language models (LLMs) has transformed how applications obtain information and process text data. ASM treats HTTP as a first‑class citizen and extends the HTTP request protocol to support LLM‑specific parameters, enabling gray‑scale deployment, weighted routing, and rich observability.

The series introduces three perspectives—traffic routing, observability, and security—while this first part focuses on traffic routing capabilities.

Prerequisites

Cluster added to an ASM instance (v1.21.6.88 or later)

Sidecar injection enabled

Model service (DashScope) API key obtained

Moonshot API key obtained for the second example

Step 1: Create a test application (sleep)

Apply the following Kubernetes manifests using the ACK cluster kubeconfig:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

Step 2: Create an LLMProvider for DashScope

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen-1.8b-chat
        stream: false
        apiKey: ${DASHSCOPE_API_KEY}

After creation, the sidecar rewrites HTTP requests from the sleep pod to the OpenAI‑compatible format, adds the API key, and upgrades to HTTPS.

Test the provider with:

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [{"role": "user", "content": "请介绍你自己"}]
  }'

Step 3: Create an LLMRoute for subscriber users

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
  name: dashscope-route
spec:
  host: dashscope.aliyuncs.com
  rules:
  - name: vip-route
    matches:
    - headers:
        user-type:
          exact: subscriber
    backendRefs:
    - providerHost: dashscope.aliyuncs.com
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com

Update the LLMProvider to add a route‑specific configuration for the subscriber route:

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen-1.8b-chat
        stream: false
        apiKey: ${DASHSCOPE_API_KEY}
    routeSpecificConfigs:
      vip-route:
        openAIConfig:
          model: qwen-turbo
          stream: false
          apiKey: ${DASHSCOPE_API_KEY}

Testing with the subscriber header shows the qwen-turbo model is used.

Step 4: Weighted traffic splitting between providers

Define a generic service demo-llm-server and route 50% of traffic to DashScope and 50% to Moonshot:

apiVersion: v1
kind: Service
metadata:
  name: demo-llm-server
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: none
  type: ClusterIP

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:
  name: moonshot
spec:
  host: api.moonshot.cn
  path: /v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: moonshot-v1-8k
        stream: false
        apiKey: ${MOONSHOT_API_KEY}

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
  name: demo-llm-server
  namespace: default
spec:
  host: demo-llm-server
  rules:
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com
      weight: 50
    - providerHost: api.moonshot.cn
      weight: 50
    name: migrate-rule

Repeated curl tests against demo-llm-server show roughly equal distribution of responses from both providers.

Conclusion

The article demonstrates how ASM can decouple applications from LLM providers using LLMProvider and LLMRoute resources, enabling dynamic routing, gray‑scale deployments, and per‑user model selection while preserving observability and security. These capabilities work on sidecars as well as on ASM ingress and egress gateways.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Observability Kubernetes traffic routing Service Mesh ASM

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.