15 min read

Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes

This guide walks readers through using ModelScope’s SwingDeploy service to locate, configure, and instantly deploy open‑source AI models to Alibaba Cloud Function Compute, explaining the resources created, how to invoke the model via HTTP triggers, and how to optimize performance with provisioned instances, logging, and concurrency settings.

Alibaba Cloud Native

Jan 6, 2024

Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes

Overview

ModelScope SwingDeploy provides one‑click deployment of selected open‑source models to Alibaba Cloud Function Compute (FC). The service automatically provisions a suitable machine configuration and creates an HTTP‑triggered inference API in a few minutes.

Finding Deployable Models

In the ModelScope model library filter by the SwingDeploy tag. Open the model detail page and click the Deploy button in the top‑right corner.

One‑Click Deployment Steps

Click Deploy on the model detail page.

Specify model version, region, CPU/GPU type, and memory.

Confirm and start the deployment. The process usually finishes within 1–5 minutes.

Resources Created in Function Compute

FC creates a service and one or more functions :

Service : a container for multiple functions; service‑level settings (log collection, network, storage) are inherited by all functions.

Function : the execution unit that contains the model code or container image and defines CPU/Memory/GPU resources.

When a request arrives, FC launches a container instance according to the function’s configuration, processes the request, and releases the instance after a period of inactivity. Idle services/functions incur no cost.

Invoking the Deployed Model

Each function is exposed via an HTTP trigger. The generated URL (shown as API_URL in ModelScope example code) can be called directly. The trigger details are viewable in the “Trigger Management” tab of the function.

Cold Start and Provisioned Instances

If a function remains idle, its container is recycled. The first request after idle incurs a cold‑start delay, which can be significant for large models (e.g., 15 GB ChatGLM‑6B). FC offers two scaling modes:

On‑Demand (Pay‑Per‑Use) : instances are created on request, with possible cold‑start latency.

Provisioned : a minimum number of instances are kept warm, eliminating cold‑start latency for those instances.

Configure the minimum (provisioned) instance count and the maximum additional on‑demand instances in the “Elastic Management” tab of the function.

Observability: Logs and Metrics

FC integrates with Alibaba Cloud Log Service (SLS). Enable logging in the service configuration to collect logs at the service, function, and request levels. Metrics such as CPU, memory, GPU usage, and request latency are available in the “Monitoring” tab or the “Monitoring Dashboard”.

Resource Configuration and Concurrency

Adjust CPU, memory, disk, and GPU specifications on the “Function Configuration” tab. For GPU functions you can select the GPU card type (e.g., T4 with up to 16 GB memory, A10 with up to 24 GB memory).

The function’s concurrency setting determines how many requests a single instance can handle simultaneously. Default is 1.

Compute‑intensive inference (e.g., single‑request GPU inference): keep concurrency at 1.

Batch‑able inference: increase concurrency to match the desired batch size.

Invocation Modes

FC supports three invocation patterns:

Synchronous invocation : the caller waits for the function to finish and receives the result directly.

Asynchronous invocation : the request is persisted and the service returns immediately; the function is guaranteed to run at least once.

Asynchronous task : the request is placed in an internal queue, allowing richer task control and observability.

Debugging and Instance Access

The “Instance List” tab provides a “Login Instance” button to SSH into a running container for interactive debugging. If the list is empty, trigger a test invocation to create an instance first. Sessions are terminated after 10 minutes of inactivity.

References

ModelScope SwingDeploy documentation: https://www.modelscope.cn/docs/%E9%83%A8%E7%BD%B2FC

Alibaba Cloud Function Compute console: https://account.aliyun.com/login/login.htm?oauth_callback=https%3A%2F%2Ffcnext.console.aliyun.com%2Foverview

FC trigger overview: https://help.aliyun.com/zh/fc/trigger-overview

Synchronous invocation guide: https://help.aliyun.com/zh/fc/user-guide/synchronous-invocations

Asynchronous invocation guide: https://help.aliyun.com/zh/fc/user-guide/overview-34

Provisioned instance configuration: https://help.aliyun.com/zh/fc/configure-provisioned-instances-and-auto-scaling-rules#section-cra-c7p-wbo

Logging configuration: https://help.aliyun.com/zh/fc/configure-the-logging-feature

GPU function limits and FAQ: https://help.aliyun.com/zh/fc/support/faq-about-gpu-accelerated-instances

serverless model deployment Alibaba Cloud function compute ModelScope AI model serving

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.