Artificial Intelligence 14 min read

Deploy Stable Diffusion in 5 Minutes with Volcengine’s Continuous Delivery CP

Learn how to quickly launch a Stable Diffusion WebUI service in just five minutes using Volcengine’s cloud‑native continuous delivery platform, which abstracts Kubernetes complexities, provides pre‑configured AI templates, serverless VCI deployment, automatic scaling, API gateway access, and includes a Python client for image generation.

ByteDance Cloud Native

Aug 7, 2024

Deploy Stable Diffusion in 5 Minutes with Volcengine’s Continuous Delivery CP

This article explains how to use Volcengine’s cloud‑native continuous delivery platform (CP) to deploy a Stable Diffusion AI model as a WebUI service with a single click, completely shielding users from underlying Kubernetes details.

AI Model Deployment Challenges

Deploying AI models involves several pain points: setting up the hardware and software environment (CPU/GPU, drivers, CUDA, Python, PyTorch/TensorFlow), managing model upgrades and migrations, configuring service orchestration and resource allocation, and exposing the service via public endpoints. These steps require deep cloud‑native knowledge and can hinder rapid AI adoption.

AI Application Features in CP

Pre‑built Templates : Includes popular AI frameworks such as Stable Diffusion (ComfyUI, WebUI), LLaMa Factory, Triton, PyTorch, with packaged OS, dependencies, and runtime environments.

Serverless Support : Integrates with Volcengine Elastic Container Instance (VCI) for fully managed, no‑ops runtime.

Service Access : Built‑in API Gateway and Load Balancer enable one‑click external access.

One‑Click Rollback : Versioned change history allows instant rollback to previous releases.

Elastic Scaling : Manual or auto‑scaling based on resource usage improves GPU utilization.

Step‑by‑Step Deployment (5 Minutes)

Import an existing VKE cluster (with csi‑tos and nvidia‑device‑plugin installed).

Create an AI application using the "AI Image Generation – Stable Diffusion WebUI" template.

Configure model storage by mounting the Artifacts repository path /stable-diffusion-webui/models/Stable-diffusion/.

Select a GPU‑type compute resource in the Service Specification.

Choose API Gateway as the external access method.

Optionally enable manual or auto scaling policies.

After confirming the configuration, the platform creates the application and starts deployment, typically completing within 1–2 minutes.

Python Client Example

from datetime import datetime
import urllib.request
import base64
import json
import time
import os
import requests

webui_server_url = os.getenv('SD_WEB_UI_URL', 'http://127.0.0.1:7860')
out_dir = os.getenv('API_OUT_DIR', 'api_out')
out_dir_t2i = os.path.join(out_dir, 'txt2img')
out_dir_i2i = os.path.join(out_dir, 'img2img')
os.makedirs(out_dir_t2i, exist_ok=True)
os.makedirs(out_dir_i2i, exist_ok=True)

def timestamp():
    return datetime.fromtimestamp(time.time()).strftime("%Y%m%d-%H%M%S")

def encode_file_to_base64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

def decode_and_save_base64(b64_str, save_path):
    with open(save_path, "wb") as file:
        file.write(base64.b64decode(b64_str))

def call_api(endpoint, **payload):
    data = json.dumps(payload).encode('utf-8')
    response = requests.post(f"{webui_server_url}/{endpoint}", data=data, headers={'Content-Type': 'application/json'}, timeout=30)
    return response.json()

def call_txt2img_api(**payload):
    resp = call_api('sdapi/v1/txt2img', **payload)
    for idx, img in enumerate(resp.get('images', [])):
        path = os.path.join(out_dir_t2i, f'txt2img-{timestamp()}-{idx}.png')
        print('save to:', path)
        decode_and_save_base64(img, path)

if __name__ == '__main__':
    payload = {
        "prompt": "masterpiece, (best quality:1.1), 1girl <lora:lora_model:1>",
        "negative_prompt": "",
        "seed": 1,
        "steps": 20,
        "width": 512,
        "height": 512,
        "cfg_scale": 7,
        "sampler_name": "DPM++ 2M",
        "n_iter": 1,
        "batch_size": 1,
    }
    call_txt2img_api(**payload)

Optimization Tips

Add the --xformers flag to the launch command to accelerate image generation.

Mount LoRA models or embeddings to customize style without retraining the base model.

Prefer using Artifacts and TOS in the same region as the VKE cluster to reduce latency when loading models.

After deployment, the service provides real‑time logs, event tracking, and both public and private access URLs via the API Gateway, enabling full lifecycle management without dealing with underlying resources.

Conclusion

The tutorial demonstrates a complete end‑to‑end workflow for deploying Stable Diffusion on Volcengine, highlighting the platform’s ability to simplify AI model serving, improve resource utilization through serverless VCI, and accelerate development cycles for enterprise users.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Serverless Python Stable Diffusion AI Deployment

Written by

ByteDance Cloud Native

Sharing ByteDance's cloud-native technologies, technical practices, and developer events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

AI Model Deployment Challenges

AI Application Features in CP

Step‑by‑Step Deployment (5 Minutes)

Python Client Example

Optimization Tips

Conclusion

ByteDance Cloud Native

How this landed with the community

Was this worth your time?

0 Comments

Step‑by‑Step Deployment (5 Minutes)