Artificial Intelligence 14 min read

Deploy Stable Diffusion on Volcengine Cloud: A Step‑by‑Step Guide

Learn how to deploy your own Stable Diffusion text‑to‑image model on Volcengine Cloud by setting up a VKE Kubernetes cluster, configuring storage, GPU resources, container images, and exposing the service via ALB or API Gateway, while leveraging mGPU sharing and serverless GPU options.

Volcano Engine Developer Services

Jun 13, 2023

Deploy Stable Diffusion on Volcengine Cloud: A Step‑by‑Step Guide

This article demonstrates how to deploy a Stable Diffusion text‑to‑image model on Volcengine Cloud using typical enterprise AI engineering practices.

Stable Diffusion Environment Dependencies

Stable Diffusion is a latent diffusion model that generates high‑quality images from arbitrary text prompts. Deploying it on the cloud requires several Volcengine services:

Container Service VKE (Kubernetes v1.24)

Image Registry CR

Elastic Container VCI

Object Storage TOS

GPU Server ecs.gni2.3xlarge NVIDIA A10

Application Load Balancer ALB

API Gateway APIG

GPU Sharing Technology mGPU

Stable Diffusion model from huggingface.co/CompVis/stable-diffusion-v1-4

Stable Diffusion WebUI from github.com/AUTOMATIC1111/stable-diffusion-webui

Step 1: Prepare VKE Cluster Environment

Create a VKE cluster in the Volcengine console, select version 1.24, use the VPC‑CNI network model, and provision GPU‑enabled nodes (ecs.gni2.3xlarge NVIDIA A10) with the nvidia‑device‑plugin installed.

Enable TOS, create a bucket, and upload the Stable Diffusion model files.

Install the required Python packages:

1 pip install --upgrade diffusers
2 pip install transformers
# Install PyTorch according to the official guide: https://pytorch.org/get-started/locally/

from huggingface_hub import snapshot_download
snapshot_download(repo_id="CompVis/stable-diffusion-v1-4", local_dir="/root/")

Step 2: Upload Model to TOS

Use rclone to copy the downloaded model files to the TOS bucket:

rclone copy diffusers/ ${rclone_config_name}:${bucketname}/diffusers --copy-links

Step 3: Deploy Stable Diffusion Service

Push a prepared container image (e.g.,

cr-demo-cn-beijing.cr.volces.com/diffusers/stable-diffusion:taiyi-0.1

) to the CR repository.

Create a PersistentVolumeClaim (PVC) in TOS and mount it at

/stable-diffusion-webui/models/Taiyi-Stable-Diffusion-1B-Chinese-v0.1

inside the container. Expose port 7860.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sd-a10
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 0
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: sd-a10
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: sd-a10
    spec:
      containers:
      - image: cr-demo-cn-beijing.cr.volces.com/${namespace}/stable-diffusion:taiyi-0.1
        imagePullPolicy: IfNotPresent
        name: sd
        resources:
          limits:
            vke.volcengine.com/mgpu-core: "30"
            vke.volcengine.com/mgpu-memory: "10240"
          requests:
            vke.volcengine.com/mgpu-core: "30"
            vke.volcengine.com/mgpu-memory: "10240"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /stable-diffusion-webui/models/Taiyi-Stable-Diffusion-1B-Chinese-v0.1
          name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: sd-tos-pvc

Expose the Service

Option 1: Use ALB – Create an ALB‑type Ingress to route traffic to the Stable Diffusion WebUI.

Option 2: Use API Gateway – Create an APIG instance, configure an upstream pointing to the VKE cluster, and expose the service via a generated domain name.

Large‑Model Engineering Practices

Beyond basic deployment, enterprise‑grade large‑model serving requires training/inference acceleration, resource‑utilization optimization, and cost control. Volcengine provides mGPU for GPU sharing and Serverless GPU (VCI) for elastic scaling.

GPU Sharing with mGPU

mGPU allows containers to claim fractional GPU cores (e.g., 1% of a GPU) and memory, improving overall utilization by over 50%.

Install the mGPU component via the VKE console.

Enable Prometheus monitoring for GPU metrics.

Add the label vke.volcengine.com/mgpu-enabled=true to node pools or individual nodes to activate mGPU.

Serverless GPU Deployment (VCI)

VCI provides a serverless, container‑based compute service that integrates with VKE. Deploy the same Stable Diffusion image using VCI, specifying the GPU instance type via annotations.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sd-vci
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 0
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: sd-vci
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        vci.vke.volcengine.com/preferred-instance-types: vci.ini2.26c-243gi
        vci.volcengine.com/tls-enable: "false"
        vke.volcengine.com/burst-to-vci: enforce
      creationTimestamp: null
      labels:
        app: sd-vci
    spec:
      containers:
      - image: cr-demo-cn-beijing.cr.volces.com/${namespace}/stable-diffusion:taiyi-0.1
        imagePullPolicy: IfNotPresent
        name: sd-vci
        resources:
          limits:
            nvidia.com/gpu: "1"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /stable-diffusion-webui/models/Taiyi-Stable-Diffusion-1B-Chinese-v0.1
          name: sd
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: sd
        persistentVolumeClaim:
          claimName: sd-tos-pvc

Conclusion

AIGC applications focus on delivering multimodal content that solves real problems. Volcengine’s cloud‑native AI infrastructure, including VKE, CR, ALB, APIG, mGPU, and VCI, helps lower the barrier for building and scaling such services.