5 min read

Deploy Alibaba's Qwen3.5-397B Model in Minutes with Serverless Function Compute

This guide explains how to quickly deploy the new Qwen3.5-397B-A17B open‑source large model using Alibaba Cloud Function Compute's serverless GPU service, covering model features, deployment steps, required commands, and performance benefits.

Alibaba Cloud Native

Mar 3, 2026

Deploy Alibaba's Qwen3.5-397B Model in Minutes with Serverless Function Compute

Model Overview

Alibaba open‑sourced Qwen3.5‑397B‑A17B, a 397 billion‑parameter large language model that uses only 17 billion active parameters thanks to a hybrid linear‑attention (Gated Delta Networks) and Mixture‑of‑Experts (MoE) architecture. The model supports 201 languages and excels in vision‑language, code generation, and autonomous‑agent tasks.

Why traditional deployment is difficult

Complex GPU environment configuration.

Labor‑intensive monitoring and maintenance.

Hard to achieve elastic scaling.

Serverless Function Compute (FC) solution

Function Compute provides a serverless GPU service for Qwen3.5, eliminating infrastructure management. Benefits:

One‑click deployment reduces integration time from days to ~5 minutes.

Memory usage drops ~60 % and inference throughput can increase up to 19×.

Automatic scaling with low operational overhead.

Step‑by‑step deployment

Create an OSS bucket and download the model files to a directory, e.g. Qwen/Qwen3.5-397B-A17B.

Deploy the white‑screen tool (template ID 283) and wait until it is ready.

In the FunModel custom deployment console, select the Serverless GPU image, configure resources, and set the startup command:

vllm serve /mnt/my-model-scope/models/Qwen/Qwen3.5-397B-A17B \
  --served-model-name Qwen/Qwen3.5-397B-A17B \
  --port 9000 \
  --trust-remote-code \
  --gpu-memory-utilization 0.9 \
  --max-model-len 262144 \
  --tensor-parallel-size 16 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3

Start the deployment, wait for the service to become available, then run inference tests.

Performance comparison

Deployment time: traditional ≈ days, FC ≈ 5 minutes.

Technical barrier: high vs. low.

Ops & iteration cost: high vs. low.

Relevant URLs

FunModel quick start: https://fun-model-docs.devsapp.net/getting-started/

Custom model deployment guide: https://fun-model-docs.devsapp.net/user-guide/custom-model-deployment/

White‑screen tool template: https://functionai.console.aliyun.com/old/template-detail?template=283

FunModel custom deployment console: https://functionai.console.aliyun.com/fun-model/cn-hangzhou/custom-model-create

cloud-native AI model deployment function compute Qwen3.5 Serverless GPU

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.