Cloud Native 28 min read

Build a Sidecarless AI Application with Alibaba Cloud Service Mesh ASM – Step‑by‑Step Guide

This guide walks you through creating a sidecarless AI demo on Alibaba Cloud Service Mesh ASM, covering environment setup, multi‑model serving with KServe, PVC storage, InferenceService configuration, business service deployment, gateway and waypoint creation, traffic routing rules, and OIDC single sign‑on integration.

Alibaba Cloud Native

Nov 22, 2023

Build a Sidecarless AI Application with Alibaba Cloud Service Mesh ASM – Step‑by‑Step Guide

Prerequisites

You need an ACK cluster, an ASM instance (version 1.18.0.131+), and tools like istioctl. Ensure the ASM instance has Ambient Mesh mode enabled and the cluster is added to the instance.

1. Enable Multi‑Model Inference Service

Create a global namespace modelmesh-serving in ASM. Use kubectl to connect to the ASM control plane and apply the following configuration to enable the multi‑model feature:

apiVersion: istio.alibabacloud.com/v1beta1
kind: ASMKServeConfig
metadata:
  name: default
spec:
  enabled: true
  multiModel: true
  tag: v0.11.0

Apply it with kubectl apply -f asmkserveconfig.yaml. A modelmesh-serving namespace with the necessary runtime workloads will appear.

2. Prepare Model Files and Declare Inference Services

Download a TensorFlow model from TensorFlow Hub and a PyTorch model (converted to ONNX) from the official tutorial. Organize them as:

$ ls -R
pytorch    tensorflow
./pytorch:
style-transfer
./pytorch/style-transfer:
candy.onnx
./tensorflow:
style-transfer
./tensorflow/style-transfer:
saved_model.pb variables
./tensorflow/style-transfer/variables:
variables.data-00000-of-00002 variables.data-00001-of-00002 variables.index

Create a PVC (e.g., my-models-pvc) using a storage class, then copy the model files into the PVC via a temporary pod:

apiVersion: v1
kind: Pod
metadata:
  name: pvc-access
  namespace: modelmesh-serving
spec:
  containers:
  - name: main
    image: ubuntu
    command: ["/bin/sh", "-ec", "sleep 10000"]
    volumeMounts:
    - name: my-pvc
      mountPath: "/mnt/models"
  volumes:
  - name: my-pvc
    persistentVolumeClaim:
      claimName: my-models-pvc

Copy files with:

kubectl cp -n modelmesh-serving tensorflow pvc-access:/mnt/models/
kubectl cp -n modelmesh-serving pytorch pvc-access:/mnt/models/

Verify the copy:

kubectl exec -n modelmesh-serving pvc-access -- ls /mnt/models

Define two InferenceService resources (one for TensorFlow, one for ONNX) in isvc.yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: tf-style-transfer
  namespace: modelmesh-serving
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: tensorflow
      storage:
        parameters:
          type: pvc
          name: my-models-pvc
        path: tensorflow/style-transfer/
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: pt-style-transfer
  namespace: modelmesh-serving
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: onnx
      storage:
        parameters:
          type: pvc
          name: my-models-pvc
        path: pytorch/style-transfer/

Apply with kubectl apply -f isvc.yaml. Both services become ready, and the appropriate runtimes (Triton for TensorFlow, OVMS for ONNX) are launched.

3. Deploy Business Services

Create a namespace apsara-demo and apply ai-apps.yaml which defines service accounts, deployments for the AI backend and the two style‑transfer workloads, and corresponding services:

kubectl create namespace apsara-demo
kubectl apply -f ai-apps.yaml

The deployments use images from Alibaba Cloud Container Registry and expose port 8000.

4. Set Up ASM Gateway, Waypoint, and Traffic Rules

Create two ASM ingress gateways (one LoadBalancer on port 80, one ClusterIP on port 8008). Enable Ambient Mesh mode for the apsara-demo namespace via the ASM console.

Deploy a waypoint proxy for the apsara-demo namespace:

istioctl x waypoint apply --service-account style-transfer -n apsara-demo

Verify the waypoint pod appears.

Model‑mesh Routing (modelsvc-routing.yaml)

Define a Gateway, VirtualService, DestinationRule, and a JSON‑to‑gRPC transcoder to route requests to the correct runtime based on the x-model-format-* headers:

# modelsvc-routing.yaml (excerpt)
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: grpc-gateway
  namespace: modelmesh-serving
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts: ['*']
    port:
      name: grpc
      number: 8008
      protocol: GRPC
  - hosts: ['*']
    port:
      name: http
      number: 80
      protocol: HTTP
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: vs-modelmesh-serving-service
  namespace: modelmesh-serving
spec:
  gateways: [grpc-gateway]
  hosts: ['*']
  http:
  - headerToDynamicSubsetKey:
    - header: x-model-format-tensorflow
      key: model.format.tensorflow
    - header: x-model-format-pytorch
      key: model.format.pytorch
    match:
    - port: 8008
    name: default
    route:
    - destination:
        host: modelmesh-serving
        port:
          number: 8033
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: dr-modelmesh-serving-service
  namespace: modelmesh-serving
spec:
  host: modelmesh-serving-service
  trafficPolicy:
    loadBalancer:
      dynamicSubset:
        subsetSelectors:
        - keys: [model.format.tensorflow]
        - keys: [model.format.pytorch]
---
apiVersion: istio.alibabacloud.com/v1beta1
kind: ASMGrpcJsonTranscoder
metadata:
  name: grpcjsontranscoder-for-kservepredictv2
  namespace: istio-system
spec:
  builtinProtoDescriptor: kserve_predict_v2
  isGateway: true
  portNumber: 8008
  workloadSelector:
    labels:
      istio: ingressgateway

Apply with kubectl apply -f modelsvc-routing.yaml.

Application Routing (app-routing.yaml)

Define a gateway for the AI app, a virtual service routing to the backend, and a virtual service that splits traffic between the TensorFlow and PyTorch style‑transfer workloads based on the user_class JWT claim:

# app-routing.yaml (excerpt)
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: ai-app-gateway
  namespace: apsara-demo
spec:
  selector:
    istio: api-ingressgateway
  servers:
  - hosts: ['*']
    port:
      name: http
      number: 8000
      protocol: HTTP
  - hosts: ['*']
    port:
      name: http-80
      number: 80
      protocol: HTTP
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ai-app-vs
  namespace: apsara-demo
spec:
  gateways: [ai-app-gateway]
  hosts: ['*']
  http:
  - route:
    - destination:
        host: ai-backend-svc
        port:
          number: 8000
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: style-transfer-vs
  namespace: apsara-demo
spec:
  hosts: [style-transfer.apsara-demo.svc.cluster.local]
  http:
  - match:
    - headers:
        user_class:
          exact: premium
    route:
    - destination:
        host: style-transfer.apsara-demo.svc.cluster.local
        port:
          number: 8000
        subset: tensorflow
  - route:
    - destination:
        host: style-transfer.apsara-demo.svc.cluster.local
        port:
          number: 8000
        subset: pytorch
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: style-transfer-dr
  namespace: apsara-demo
spec:
  host: style-transfer.apsara-demo.svc.cluster.local
  subsets:
  - name: tensorflow
    labels:
      model-format: tensorflow
  - name: pytorch
    labels:
      model-format: pytorch

Apply with kubectl apply -f app-routing.yaml.

5. Integrate OIDC Single Sign‑On

Link the ASM ingress gateway with an Alibaba Cloud IDaaS OIDC application. Add a custom claim user_type (mapped to user_class) in the IDaaS console, configure the OIDC app to return this claim after login, and enable the integration via the ASM UI.

Result

After completing the steps, the demo AI application is accessible at http://{ASM‑gateway‑address}/home. The sidecarless mesh provides dynamic subset routing, JSON‑to‑gRPC transcoding, and user‑based traffic splitting without requiring sidecar injection, demonstrating how ASM can simplify AI service deployment and management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Kubernetes service mesh ASM Model Serving KServe Sidecarless

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.