Cloud Native 10 min read

How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet

This article explains why AI inference services require multi‑cluster gray‑release, outlines the risks of traditional updates, and details how ACK One Fleet combined with Kruise Rollout provides a controlled, observable, and rollback‑capable solution for deploying large AI models across hybrid cloud clusters.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet

In the era of large AI models, inference services become critical business workloads, but updating models can cause severe stability issues such as cold‑start latency, QPS drops, and difficult rollbacks, especially when services run across multiple regions and hybrid‑cloud clusters.

Why Gray Release Is Mandatory for AI Inference

Extreme sensitivity to stability : slow model loading, high cold‑start cost, and potential massive QPS loss if a bad version is released.

Multi‑cluster deployment is the norm : resources are unevenly distributed, leading to deployments across many clusters.

Traditional manual updates are error‑prone : repeated kubectl apply per cluster, separate scripts and approvals, and difficulty pinpointing failures.

ACK One Fleet + Kruise Rollout Solution

ACK One Fleet provides fleet‑level scheduling and management of AI workloads, while Kruise Rollout offers fine‑grained progressive delivery inside each cluster. Together they enable:

Intelligent distribution of models to all regions.

Dynamic resource‑aware scheduling, elastic node pools, cluster‑level priority, and multi‑cluster HPA.

Unified rollout policies that can be approved centrally via the kubectl amc plugin.

Key Features of Kruise Rollout

Rich rollout strategies for Deployments, CloneSets, StatefulSets, DaemonSets, etc.

Canary, blue‑green, and multi‑batch updates.

Fine‑grained traffic routing, A/B testing, and end‑to‑end gray release.

Support for various ingress controllers and Gateway API, with extensible Lua scripts.

Deployment Workflow

Define a Rollout in the fleet and propagate it to sub‑clusters. The global scheduler places the inference service according to the propagation policy, then each sub‑cluster’s Kruise Rollout controller updates pods in batches (e.g., 10% → pause → 50% → pause → 100%).

Use kubectl amc for unified approval. Administrators can approve all clusters with a single command:

kubectl amc rollout approve rollouts/qwen-inference-rollout -M

or approve a specific cluster with -m ${clusterid}.

Example: Rolling Out a New Qwen Model

The following YAML defines a three‑step canary rollout and a propagation policy that distributes the rollout to Beijing and Shanghai clusters.

apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
  name: qwen-inference-rollout
spec:
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: qwen-inference
  strategy:
    canary:
      enableExtraWorkloadForCanary: false
      steps:
      - replicas: 10%
      - replicas: 50%
      - replicas: 100%
---
apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: qwen-inference-rollout-pp
  namespace: demo
spec:
  preserveResourcesOnDeletion: false
  resourceSelectors:
  - apiVersion: rollouts.kruise.io/v1beta1
    kind: Rollout
    name: qwen-inference-rollout
  placement:
    replicaScheduling:
      replicaSchedulingType: Duplicated

Advantages

Risk isolation : New versions affect only the defined batch and traffic slice, with optional manual approval.

Reduced complexity : Fleet handles global scheduling; Kruise Rollout handles per‑cluster updates.

Operational simplicity : kubectl amc provides a single point to view and control rollout status across all clusters.

Consistent multi‑cluster policy : The same Rollout definition is applied uniformly.

By integrating ACK One Fleet’s global orchestration with Kruise Rollout’s precise per‑cluster delivery, enterprises can safely and efficiently roll out AI inference models across complex, multi‑region, hybrid‑cloud environments.

AIKubernetesgray-releaseMulti-ClusterACK OneKruise Rollout
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.