Cloud Native 10 min read

Deploying a CPU‑Accelerated Stable Diffusion Service on Alibaba Cloud ACK

This guide shows how to deploy a cost‑effective, secure Stable Diffusion XL Turbo text‑to‑image service on an Alibaba Cloud ACK cluster using CPU‑only instances, Helm charts, and optional confidential TDX VM pools for protected inference.

Alibaba Cloud Native

Feb 2, 2024

Deploying a CPU‑Accelerated Stable Diffusion Service on Alibaba Cloud ACK

Overview

The Stable Diffusion XL Turbo model can be served on Alibaba Cloud's 8th‑gen CPU instances (g8i) inside an ACK Kubernetes cluster using Intel IPEX acceleration. This provides a low‑cost, secure alternative to GPU inference for text‑to‑image generation.

Environment Preparation

Create an ACK node pool with g8i instances (e.g., ecs.g8i.4xlarge, ecs.g8i.8xlarge, ecs.g8i.12xlarge) ensuring each node has ≥16 vCPU.

If you do not have an ACK cluster, follow Alibaba Cloud documentation to create a managed Kubernetes cluster.

Deploying the Service

Install the Helm chart that bundles the Stable Diffusion IPEX image:

helm install stable-diffusion-ipex https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/pre/charts-incubator/stable-diffusion-ipex-0.1.9.tgz

Wait ~10 minutes, then verify the pods are running: kubectl get pod | grep stable-diffusion-ipex When the pods are ready, a text‑to‑image REST API and a Web UI are exposed.

Testing the Service

Port‑forward the Web UI to your local machine:

kubectl port-forward svc/stable-diffusion-ipex-webui 5001:5001

Open http://127.0.0.1:5001/ in a browser.

Enter a prompt such as “A panda listening to music with headphones. highly detailed, 8k.” and click Generate.

Performance Benchmark

Average inference latency for a single batch (step 4) on different g8i instance types:

ecs.g8i.4xlarge (16 vCPU, 64 GiB) – 512×512: 2.2 s, 1024×1024: 8.8 s (Pod request/limit 14/16).

ecs.g8i.8xlarge (32 vCPU, 128 GiB) – 512×512: 1.3 s, 1024×1024: 4.7 s (Pod request/limit 24/32).

ecs.g8i.12xlarge (48 vCPU, 192 GiB) – 512×512: 1.1 s, 1024×1024: 3.9 s (Pod request/limit 32/32).

In multi‑batch, multi‑step scenarios the CPU speed is lower than an A10 GPU (e.g., 0.14 images/s vs 0.4 images/s for step 30, batch 16). However, with step 4 and batch 16 the ecs.g8i.8xlarge reaches ~1.2 images/s, providing near‑second latency.

Confidential Inference with Intel TDX

To run the service inside a TDX confidential VM node pool, create a node pool with the TDX label and apply the following values file ( tdx_values.yaml) containing the node selector:

nodeSelector:</code><code>  nodepool-label: tdx-vm-pool

Upgrade the Helm release with the TDX values:

helm upgrade stable-diffusion-ipex https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/pre/charts-incubator/stable-diffusion-ipex-0.1.9.tgz -f tdx_values.yaml

This enables Intel TDX memory encryption, protecting model weights and inference data.

Cost‑Performance Recommendations

ecs.g8i.8xlarge replaces a comparable GPU instance with ~9 % lower cost while maintaining ~1.2 images/s.

ecs.g8i.4xlarge reduces cost >53 % but drops throughput to ~0.5 images/s.

For large‑scale, cost‑sensitive text‑to‑image workloads that also require model confidentiality, the g8i family (especially the 4xlarge, 8xlarge, and 12xlarge) provides a practical CPU‑only alternative to GPU inference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Performance Benchmark Stable Diffusion Alibaba Cloud Confidential Computing CPU acceleration

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.