How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters
This guide explains how Alibaba Cloud's ACK One registered clusters can provide multi‑region, serverless GPU compute for AI workloads by using Kubernetes‑compatible labels, the ack‑co‑scheduler, and ResourcePolicy objects to dynamically allocate resources across regions, with step‑by‑step configuration examples.
Enterprises modernizing their infrastructure face limits of traditional IDC data centers, which cannot scale dynamically and lack elasticity. Alibaba Cloud's ACK One registered clusters offer minute‑level onboarding, full Kubernetes compatibility, and serverless compute that can address these constraints.
In the AI era, model sizes have grown to hundreds of billions of parameters, driving exponential growth in training and inference compute requirements. While ACK One serverless can handle ordinary workloads, it struggles with AI‑scale demands due to GPU model differences across regions and inventory fluctuations.
Solution Overview
Alibaba Cloud introduces a multi‑region serverless compute scheduling solution for ACK One registered clusters. The core idea is to provide “unlimited” compute by allocating GPU resources across regions, enabling low‑latency, high‑throughput AI inference at scale.
Getting Started
Log in to the Alibaba Cloud Container Service console and enable the container service.
Log in to the Container Compute Service console and enable ACS.
Create an ACK One registered cluster and connect it to an on‑premise or third‑party Kubernetes cluster (Kubernetes 1.24+ recommended). See the official guide for details.
Install the ACK Virtual Node component. Refer to the “ACK One registered cluster using Serverless compute” documentation.
Specifying a Target Region
To schedule serverless compute to a specific region, add the label alibabacloud.com/serverless-region-id: <RegionID> to the workload definition. Example deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-gpu-specified-region
name: nginx-gpu-deployment-specified-region
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nginx-gpu-specified-region
template:
metadata:
labels:
alibabacloud.com/acs: "true"
alibabacloud.com/compute-class: gpu
alibabacloud.com/compute-qos: default
alibabacloud.com/gpu-model-series: example-model # replace with actual model, e.g., T4
alibabacloud.com/serverless-region-id: <RegionID> # omit to use default region
app: nginx-gpu-specified-region
spec:
containers:
- image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 1
memory: 1Gi
nvidia.com/gpu: "1"
requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: "1"Dynamic Scheduling with ack-co-scheduler
Static region labels lack flexibility. The ack-co-scheduler introduces a ResourcePolicy that can automatically fall back to other regions when the preferred region runs out of GPU capacity.
ResourcePolicy Example
apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
name: multi-vk-gpu-resourcepolicy
namespace: default
spec:
selector:
app: nginx-gpu-resourcepolicy # Pods with this label follow the policy
units:
- resource: acs
nodeSelector:
topology.kubernetes.io/region: <RegionID>
type: virtual-kubelet
podLabels:
alibabacloud.com/serverless-region-id: <RegionID>
alibabacloud.com/compute-class: gpu
alibabacloud.com/compute-qos: default
alibabacloud.com/gpu-model-series: example-model
- resource: acs
nodeSelector:
topology.kubernetes.io/region: <RegionID>
type: virtual-kubelet
podLabels:
alibabacloud.com/serverless-region-id: <RegionID>
alibabacloud.com/compute-class: gpu
alibabacloud.com/compute-qos: default
alibabacloud.com/gpu-model-series: example-modelKey fields:
spec.selector : selects Pods with app=nginx-gpu-resourcepolicy to apply the policy.
spec.units : defines a prioritized list of resources. The first unit tries the preferred region; if insufficient, the scheduler falls back to the next unit.
resource: acs : indicates the use of Alibaba Cloud Serverless compute.
nodeSelector : pins the virtual node to a specific region.
podLabels : labels added to the Pod so that the underlying serverless runtime knows which region and GPU class to use.
Deployment Using the Scheduler
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-gpu-resourcepolicy
name: nginx-gpu-deployment-resourcepolicy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nginx-gpu-resourcepolicy
template:
metadata:
labels:
app: nginx-gpu-resourcepolicy
spec:
schedulerName: ack-co-scheduler
containers:
- image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 1
memory: 1Gi
nvidia.com/gpu: "1"
requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: "1"This deployment will first attempt to run in the region specified by the policy; if GPU capacity is unavailable, the scheduler automatically redirects the Pod to another region defined in the units list.
Additional Considerations
GPU model differences : Available GPU types vary by region, so choose a model that exists in the target region (e.g., T4, V100).
Inventory volatility : GPU capacity can fluctuate; in extreme cases, resources may be temporarily unavailable.
By leveraging ACK One registered clusters, serverless compute, and the ack‑co‑scheduler, enterprises can achieve elastic, cross‑region AI inference with minimal operational overhead.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
