One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy
This guide explains how to quickly select a ModelScope open‑source LLM, deploy it to Alibaba Cloud Function Compute using the SwingDeploy one‑click feature, enable reserved idle billing, and evaluate the cost savings compared with traditional GPU provisioning.
Introduction
ModelScope provides a large collection of open‑source models. Users often need to select a model and run it in production, but face challenges such as GPU resource management and model deployment. Alibaba Cloud Function Compute (FC) combined with ModelScope’s SwingDeploy offers a solution that handles deployment, scaling, protection, and monitoring while abstracting away infrastructure operations.
One‑Click Model Deployment (SwingDeploy)
SwingDeploy lets users deploy a ModelScope model directly to their Alibaba Cloud account. The platform automatically recommends the optimal GPU configuration based on the model’s requirements, enabling a production‑grade inference API within about five minutes.
Step‑by‑Step Deployment Guide (Example: Baichuan LLM)
Preparation
Open the ModelScope website and log in or register.
Bind an Alibaba Cloud account to gain access to online debugging, training, and deployment features.
Model Deployment
Navigate to the model card for baichuan2-7b-chat-4bits (or any other model) and click the Deploy button.
Select Quick Deploy (SwingDeploy) and choose Function Compute (FC) as the target platform.
In the popup, configure version, region, GPU type, and memory size, then click One‑Click Deploy.
Verification
After 1–5 minutes, the model is deployed. Check the “Deployment Service (SwingDeploy)” section in ModelScope for a “Success” status.
Enable Reserved Idle Billing
By default, deployments use pay‑as‑you‑go mode. To reduce costs, switch the service mode to Reserved Mode, then activate the idle‑billing option in the Function Compute console under “Function Elastic Management”. This enables GPU resources to be charged only when actively used, with a 1/10 cost in idle state.
Using the Deployed Model
After deployment, the “Use Now” button on the ModelScope service page provides the endpoint for invoking the LLM.
LLM Model List
Commonly supported open‑source LLMs include:
Qwen series (14B, 7B, 1.8B variants)
Baichuan series (13B, 7B, chat and base versions)
ChatGLM series (3‑6B, 2‑6B)
More models are available on the ModelScope website.
Cost Efficiency of GPU Idle Billing
Traditional GPU billing charges the full price even when idle. Function Compute’s idle‑billing charges 0.00011 ¥/GB·s during active use and 0.000009 ¥/GB·s when idle, roughly one‑tenth of the active cost.
Example Cost Calculation
A startup reserves ten 16 GB GPU instances. Without idle billing, the cost is 6.34 ¥/hour per instance. With idle billing (40 min idle, 20 min active), the cost drops to 2.46 ¥/hour per instance, saving about 60 %.
Free Trial and Quota
New Function Compute users receive a three‑month free trial with the following quotas: 1 M GB·s GPU, 500 k vCPU·s, 2 M GB·s memory, and 8 M function calls. Additional CDT traffic quota (100 GB/month) is also provided after December 19 2023.
How to Activate the Service
Log in to the Function Compute console, open the elastic management page of the deployed function, and enable the idle‑billing toggle.
References
ModelScope community: https://modelscope.cn/home
Function Compute product page: https://www.aliyun.com/product/fc
One‑click deployment guide: https://developer.aliyun.com/article/1307460
Qwen model series: https://modelscope.cn/organization/qwen
Zhipu AI models: https://modelscope.cn/organization/ZhipuAI
Baichuan models: https://modelscope.cn/organization/baichuan-inc
Idle GPU public test application: https://survey.aliyun.com/apps/zhiliao/dXfRVPEm-
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
