Artificial Intelligence 7 min read

How to Deploy the Open‑Source QwQ‑32B Inference Model on Alibaba Cloud CAP

This guide walks you through deploying the open‑source QwQ‑32B inference model using Alibaba Cloud's Serverless AI platform CAP, covering benchmark highlights, preparation steps, two deployment methods (application template and model service), verification, and project cleanup.

Alibaba Cloud Developer

Mar 11, 2025

How to Deploy the Open‑Source QwQ‑32B Inference Model on Alibaba Cloud CAP

QwQ‑32B, a newly released open‑source inference model, quickly gained global attention for its strong performance across various scenarios.

Powered by Alibaba Cloud Function Compute (FC), the Serverless + AI cloud‑native development platform CAP now offers both model service and application‑template deployment options, allowing you to launch QwQ‑32B via a one‑click template or API calls.

Benchmark Highlights

In tests such as the AIME24 math set, LiveCodeBench coding evaluation, LiveBench (led by Meta’s chief scientist), IFEval instruction‑following, and the BFCL function‑calling benchmark, QwQ‑32B matches or surpasses leading models like DeepSeek‑R1‑671B, DeepSeek‑R1‑Distilled‑Qwen‑32B, OpenAI‑o1‑mini, and others.

Preparation

1. Open the CAP console, authorize access, and wait for the authorization to complete.

2. The GPU function created in Function Compute is billed by resource‑time; unused functions only incur snapshot fees. Claim the Function Compute trial quota to offset costs.

Method One: Application‑Template Deployment

Step 1 – Create Project

Enter the CAP console and click “Create based on template”.

Step 2 – Deploy Template

Search for “QWQ” and select the “Qwen‑QwQ inference model chat assistant” template, then click “Deploy Now”.

Choose a region (Beijing, Shanghai, or Hangzhou), review the billing preview, and confirm deployment (approximately 10 minutes).

Notes

Select a region close to your resources; if using a NAS file system, match the region.

If GPU resources are insufficient, switch regions and retry.

Method Two: Model‑Service Deployment (API)

Step 1 – Create Blank Project

In the CAP console, click “Create blank project” and name it.

Step 2 – Choose Model Service

Select the QwQ‑32B‑GGUF model (currently only available in Hangzhou).

Step 3 – Deploy Model Service

Configure resources (Ada series is recommended) or customize GPU type and specs.

Preview billing, then confirm deployment (model download takes 10‑30 minutes).

Step 4 – Verify Model Service

Click “Debug” to test the model, use the Open‑WebUI for interactive chat, or invoke the model from the command line.

You can also test via third‑party platforms such as Chatbox.

Project Deletion

Open project details and click “Delete”.

Confirm which services to retain or remove.

Check the acknowledgment box and confirm deletion.

Reference Links

1. https://www.aliyun.com/product/cap

2. https://common-buy.aliyun.com/package

3. https://help.aliyun.com/zh/functioncompute/fc-3-0/product-overview/billing-overview-1

4. https://cap.console.aliyun.com/projects

5. https://help.aliyun.com/zh/cap/product-overview/billing-overview

6. https://cap.console.aliyun.com/projects

7. https://help.aliyun.com/zh/cap/product-overview/billing-overview

8. https://web.chatboxai.app/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

tutorial Alibaba Cloud inference AI model deployment CAP QwQ-32B

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.