One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

This guide explains how to quickly select a ModelScope open‑source LLM, deploy it to Alibaba Cloud Function Compute using the SwingDeploy one‑click feature, enable reserved idle billing, and evaluate the cost savings compared with traditional GPU provisioning.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

Introduction

ModelScope provides a large collection of open‑source models. Users often need to select a model and run it in production, but face challenges such as GPU resource management and model deployment. Alibaba Cloud Function Compute (FC) combined with ModelScope’s SwingDeploy offers a solution that handles deployment, scaling, protection, and monitoring while abstracting away infrastructure operations.

One‑Click Model Deployment (SwingDeploy)

SwingDeploy lets users deploy a ModelScope model directly to their Alibaba Cloud account. The platform automatically recommends the optimal GPU configuration based on the model’s requirements, enabling a production‑grade inference API within about five minutes.

Step‑by‑Step Deployment Guide (Example: Baichuan LLM)

Preparation

Open the ModelScope website and log in or register.

Bind an Alibaba Cloud account to gain access to online debugging, training, and deployment features.

Model Deployment

Navigate to the model card for baichuan2-7b-chat-4bits (or any other model) and click the Deploy button.

Select Quick Deploy (SwingDeploy) and choose Function Compute (FC) as the target platform.

In the popup, configure version, region, GPU type, and memory size, then click One‑Click Deploy.

Verification

After 1–5 minutes, the model is deployed. Check the “Deployment Service (SwingDeploy)” section in ModelScope for a “Success” status.

Enable Reserved Idle Billing

By default, deployments use pay‑as‑you‑go mode. To reduce costs, switch the service mode to Reserved Mode, then activate the idle‑billing option in the Function Compute console under “Function Elastic Management”. This enables GPU resources to be charged only when actively used, with a 1/10 cost in idle state.

Using the Deployed Model

After deployment, the “Use Now” button on the ModelScope service page provides the endpoint for invoking the LLM.

LLM Model List

Commonly supported open‑source LLMs include:

Qwen series (14B, 7B, 1.8B variants)

Baichuan series (13B, 7B, chat and base versions)

ChatGLM series (3‑6B, 2‑6B)

More models are available on the ModelScope website.

Cost Efficiency of GPU Idle Billing

Traditional GPU billing charges the full price even when idle. Function Compute’s idle‑billing charges 0.00011 ¥/GB·s during active use and 0.000009 ¥/GB·s when idle, roughly one‑tenth of the active cost.

Example Cost Calculation

A startup reserves ten 16 GB GPU instances. Without idle billing, the cost is 6.34 ¥/hour per instance. With idle billing (40 min idle, 20 min active), the cost drops to 2.46 ¥/hour per instance, saving about 60 %.

Free Trial and Quota

New Function Compute users receive a three‑month free trial with the following quotas: 1 M GB·s GPU, 500 k vCPU·s, 2 M GB·s memory, and 8 M function calls. Additional CDT traffic quota (100 GB/month) is also provided after December 19 2023.

How to Activate the Service

Log in to the Function Compute console, open the elastic management page of the deployed function, and enable the idle‑billing toggle.

References

ModelScope community: https://modelscope.cn/home

Function Compute product page: https://www.aliyun.com/product/fc

One‑click deployment guide: https://developer.aliyun.com/article/1307460

Qwen model series: https://modelscope.cn/organization/qwen

Zhipu AI models: https://modelscope.cn/organization/ZhipuAI

Baichuan models: https://modelscope.cn/organization/baichuan-inc

Idle GPU public test application: https://survey.aliyun.com/apps/zhiliao/dXfRVPEm-

LLMmodel deploymentcost optimizationGPUfunction compute
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.