Deploying a CPU‑Accelerated Stable Diffusion Service on Alibaba Cloud ACK
This guide shows how to deploy a cost‑effective, secure Stable Diffusion XL Turbo text‑to‑image service on an Alibaba Cloud ACK cluster using CPU‑only instances, Helm charts, and optional confidential TDX VM pools for protected inference.
Overview
The Stable Diffusion XL Turbo model can be served on Alibaba Cloud's 8th‑gen CPU instances (g8i) inside an ACK Kubernetes cluster using Intel IPEX acceleration. This provides a low‑cost, secure alternative to GPU inference for text‑to‑image generation.
Environment Preparation
Create an ACK node pool with g8i instances (e.g., ecs.g8i.4xlarge, ecs.g8i.8xlarge, ecs.g8i.12xlarge) ensuring each node has ≥16 vCPU.
If you do not have an ACK cluster, follow Alibaba Cloud documentation to create a managed Kubernetes cluster.
Deploying the Service
Install the Helm chart that bundles the Stable Diffusion IPEX image:
helm install stable-diffusion-ipex https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/pre/charts-incubator/stable-diffusion-ipex-0.1.9.tgzWait ~10 minutes, then verify the pods are running: kubectl get pod | grep stable-diffusion-ipex When the pods are ready, a text‑to‑image REST API and a Web UI are exposed.
Testing the Service
Port‑forward the Web UI to your local machine:
kubectl port-forward svc/stable-diffusion-ipex-webui 5001:5001Open http://127.0.0.1:5001/ in a browser.
Enter a prompt such as “A panda listening to music with headphones. highly detailed, 8k.” and click Generate.
Performance Benchmark
Average inference latency for a single batch (step 4) on different g8i instance types:
ecs.g8i.4xlarge (16 vCPU, 64 GiB) – 512×512: 2.2 s, 1024×1024: 8.8 s (Pod request/limit 14/16).
ecs.g8i.8xlarge (32 vCPU, 128 GiB) – 512×512: 1.3 s, 1024×1024: 4.7 s (Pod request/limit 24/32).
ecs.g8i.12xlarge (48 vCPU, 192 GiB) – 512×512: 1.1 s, 1024×1024: 3.9 s (Pod request/limit 32/32).
In multi‑batch, multi‑step scenarios the CPU speed is lower than an A10 GPU (e.g., 0.14 images/s vs 0.4 images/s for step 30, batch 16). However, with step 4 and batch 16 the ecs.g8i.8xlarge reaches ~1.2 images/s, providing near‑second latency.
Confidential Inference with Intel TDX
To run the service inside a TDX confidential VM node pool, create a node pool with the TDX label and apply the following values file ( tdx_values.yaml) containing the node selector:
nodeSelector:</code><code> nodepool-label: tdx-vm-poolUpgrade the Helm release with the TDX values:
helm upgrade stable-diffusion-ipex https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/pre/charts-incubator/stable-diffusion-ipex-0.1.9.tgz -f tdx_values.yamlThis enables Intel TDX memory encryption, protecting model weights and inference data.
Cost‑Performance Recommendations
ecs.g8i.8xlarge replaces a comparable GPU instance with ~9 % lower cost while maintaining ~1.2 images/s.
ecs.g8i.4xlarge reduces cost >53 % but drops throughput to ~0.5 images/s.
For large‑scale, cost‑sensitive text‑to‑image workloads that also require model confidentiality, the g8i family (especially the 4xlarge, 8xlarge, and 12xlarge) provides a practical CPU‑only alternative to GPU inference.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
