Cloud Native 11 min read

How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters

This guide explains how Alibaba Cloud's ACK One registered clusters can provide multi‑region, serverless GPU compute for AI workloads by using Kubernetes‑compatible labels, the ack‑co‑scheduler, and ResourcePolicy objects to dynamically allocate resources across regions, with step‑by‑step configuration examples.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters

Enterprises modernizing their infrastructure face limits of traditional IDC data centers, which cannot scale dynamically and lack elasticity. Alibaba Cloud's ACK One registered clusters offer minute‑level onboarding, full Kubernetes compatibility, and serverless compute that can address these constraints.

In the AI era, model sizes have grown to hundreds of billions of parameters, driving exponential growth in training and inference compute requirements. While ACK One serverless can handle ordinary workloads, it struggles with AI‑scale demands due to GPU model differences across regions and inventory fluctuations.

Solution Overview

Alibaba Cloud introduces a multi‑region serverless compute scheduling solution for ACK One registered clusters. The core idea is to provide “unlimited” compute by allocating GPU resources across regions, enabling low‑latency, high‑throughput AI inference at scale.

Getting Started

Log in to the Alibaba Cloud Container Service console and enable the container service.

Log in to the Container Compute Service console and enable ACS.

Create an ACK One registered cluster and connect it to an on‑premise or third‑party Kubernetes cluster (Kubernetes 1.24+ recommended). See the official guide for details.

Install the ACK Virtual Node component. Refer to the “ACK One registered cluster using Serverless compute” documentation.

Specifying a Target Region

To schedule serverless compute to a specific region, add the label alibabacloud.com/serverless-region-id: <RegionID> to the workload definition. Example deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-specified-region
  name: nginx-gpu-deployment-specified-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-specified-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # replace with actual model, e.g., T4
        alibabacloud.com/serverless-region-id: <RegionID>  # omit to use default region
        app: nginx-gpu-specified-region
    spec:
      containers:
      - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"

Dynamic Scheduling with ack-co-scheduler

Static region labels lack flexibility. The ack-co-scheduler introduces a ResourcePolicy that can automatically fall back to other regions when the preferred region runs out of GPU capacity.

ResourcePolicy Example

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: multi-vk-gpu-resourcepolicy
  namespace: default
spec:
  selector:
    app: nginx-gpu-resourcepolicy  # Pods with this label follow the policy
  units:
  - resource: acs
    nodeSelector:
      topology.kubernetes.io/region: <RegionID>
      type: virtual-kubelet
    podLabels:
      alibabacloud.com/serverless-region-id: <RegionID>
      alibabacloud.com/compute-class: gpu
      alibabacloud.com/compute-qos: default
      alibabacloud.com/gpu-model-series: example-model
  - resource: acs
    nodeSelector:
      topology.kubernetes.io/region: <RegionID>
      type: virtual-kubelet
    podLabels:
      alibabacloud.com/serverless-region-id: <RegionID>
      alibabacloud.com/compute-class: gpu
      alibabacloud.com/compute-qos: default
      alibabacloud.com/gpu-model-series: example-model

Key fields:

spec.selector : selects Pods with app=nginx-gpu-resourcepolicy to apply the policy.

spec.units : defines a prioritized list of resources. The first unit tries the preferred region; if insufficient, the scheduler falls back to the next unit.

resource: acs : indicates the use of Alibaba Cloud Serverless compute.

nodeSelector : pins the virtual node to a specific region.

podLabels : labels added to the Pod so that the underlying serverless runtime knows which region and GPU class to use.

Deployment Using the Scheduler

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-resourcepolicy
  name: nginx-gpu-deployment-resourcepolicy
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-resourcepolicy
  template:
    metadata:
      labels:
        app: nginx-gpu-resourcepolicy
    spec:
      schedulerName: ack-co-scheduler
      containers:
      - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"

This deployment will first attempt to run in the region specified by the policy; if GPU capacity is unavailable, the scheduler automatically redirects the Pod to another region defined in the units list.

Additional Considerations

GPU model differences : Available GPU types vary by region, so choose a model that exists in the target region (e.g., T4, V100).

Inventory volatility : GPU capacity can fluctuate; in extreme cases, resources may be temporarily unavailable.

By leveraging ACK One registered clusters, serverless compute, and the ack‑co‑scheduler, enterprises can achieve elastic, cross‑region AI inference with minimal operational overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ServerlessGPU schedulingACK One
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.