Deploy Alibaba’s Qwen3.5‑397B‑A17B Model in One Click with PAI‑Model Gallery

Alibaba's open‑source Qwen3.5‑397B‑A17B model, featuring 397 billion parameters and a hybrid Gated Delta Network/MoE architecture, delivers superior performance and reduced memory usage, and can be deployed instantly through the PAI‑Model Gallery with step‑by‑step guidance and enterprise‑grade security.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Deploy Alibaba’s Qwen3.5‑397B‑A17B Model in One Click with PAI‑Model Gallery

Model Overview

Qwen3.5‑397B‑A17B is an open‑source large language model released by Alibaba. It has 397 billion total parameters but only 17 billion active parameters. Compared with the 1‑trillion‑parameter Qwen3‑Max, it achieves higher accuracy on a range of benchmarks while reducing GPU memory consumption by about 60 % and increasing inference throughput up to 19×.

Architecture

The model uses a hybrid design that combines linear attention implemented as Gated Delta Networks with a sparse mixture‑of‑experts (MoE) layer. This yields fast inference and lower cost while preserving capability. Language support has been expanded to 201 languages and dialects. Benchmarks show strong performance on reasoning, code generation, agent tasks, and multimodal understanding.

PAI‑Model Gallery

PAI‑Model Gallery is a component of Alibaba Cloud’s PAI platform that aggregates open‑source models (LLM, CV, NLP, etc.) and provides zero‑code pipelines for training, deployment and inference. The Qwen3.5‑397B‑A17B model is listed there with enterprise‑grade security and automatic cloud‑resource adaptation.

Model architecture diagram
Model architecture diagram

One‑Click Deployment Procedure

Open the model page in PAI‑Model Gallery:

https://pai.console.aliyun.com/#/quick-start/models/Qwen3.5-397B-A17B/intro

.

Click Deploy . The platform offers high‑performance back‑ends based on SGLang and vLLM. Select the desired compute resources (GPU type, number of instances, etc.).

After deployment finishes, retrieve the service endpoint and authentication token from the “View Call Info” button on the service page.

Invoke the model via HTTP API, SDKs, or the PAI online debugging console. Detailed API usage is documented at

https://help.aliyun.com/zh/pai/getting-started/model-gallery-quick-st#art-091a6b66e9v4x

.

Deployment UI screenshot
Deployment UI screenshot

Additional Model Support in PAI‑Model Gallery

The gallery continuously adds popular open‑source models such as Qwen, Wan, DeepSeek, Kimi, MiniMax, and PAI‑optimized variants (e.g., Qwen3‑235B‑A22B‑PAI‑optimized) with built‑in EP+PD deployment templates for improved performance.

large language modelAI inferenceAlibaba CloudOne‑Click DeploymentPAI Model GalleryQwen3.5
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.