How to Deploy QwQ-32B Model on Alibaba Cloud Function Compute Using CAP

This guide walks you through deploying the open‑source QwQ‑32B model on Alibaba Cloud Function Compute with CAP, covering required services, step‑by‑step deployment, cost notes, accessing the demo UI, interacting with the model, scaling settings, and resource cleanup.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Deploy QwQ-32B Model on Alibaba Cloud Function Compute Using CAP

Solution Architecture

The architecture consists of a Cloud Application Platform (CAP) project providing fully managed serverless compute for the model service and a NAS file storage for the model files.

Architecture diagram
Architecture diagram

Deploy QwQ-32B Model

Prepare an Alibaba Cloud account and enable Function Compute. Open the CAP console, select the provided project template (region set to China Beijing 2 by default), and click Deploy Project . Confirm the deployment and wait 10–12 minutes for the functions to be created.

After deployment, the console will show a success screen similar to the following:

Deployment result
Deployment result

Application Experience

1. Access the demo application – locate the generated URL in the CAP console and open it.

Demo URL
Demo URL

2. Interact with the model – type a question such as “Who are you?” into the text box and receive a response from the Ollama service.

Chat interface
Chat interface

3. Adjust Ollama service configuration – modify the reserved instance count to scale the model service as needed.

Scaling settings
Scaling settings

4. Use Chatbox client to call Ollama API – obtain the API endpoint from the CAP console, install the Chatbox client (example shown for macOS M3), configure the endpoint and model name (cap‑qwq:latest), and start chatting.

Chatbox configuration
Chatbox configuration

Cost and Usage Notes

Free trial quotas for Function Compute and NAS can cover the resources required by this tutorial. If the trial is exhausted, the expected cost is less than ¥9 per hour, though actual charges depend on instance scaling and usage.

GPU functions are billed by specification and runtime; idle snapshots incur minimal fees. Delete resources after testing to avoid unnecessary charges.

Cleaning Up Resources

To delete the CAP project, open the CAP console, navigate to the project list, click the delete action for the target project, and follow the prompts.

Delete project
Delete project
model deploymentAlibaba Cloudfunction computeCAPQwQ-32B
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.