Deploy the QwQ‑32B LLM on Alibaba Cloud Function Compute with CAP in Minutes
This guide walks you through deploying the open‑source QwQ‑32B model on Alibaba Cloud Function Compute using the Cloud Application Platform (CAP), covering architecture, required services, account setup, step‑by‑step deployment, cost considerations, model interaction via Open WebUI and Chatbox, scaling configuration, and resource cleanup.
Solution Overview
This guide shows how to deploy the open‑source QwQ‑32B large language model on Alibaba Cloud Function Compute (FC) using the Cloud Application Platform (CAP). Two serverless functions are created: Ollama to host the QwQ‑32B‑GGUF model and Open WebUI to provide a web UI for interactive chat.
Architecture
One CAP project that runs the Ollama and Open WebUI functions.
A NAS file system attached to the project for persisting the model files.
Prerequisites
Alibaba Cloud account (register at https://account.aliyun.com/register/qr_register.htm if you do not have one).
Enable Function Compute service in the console (https://fcnext.console.aliyun.com/) and grant the required permissions.
Deployment Procedure
1. Create CAP Project
Open the CAP console: https://cap.console.aliyun.com/create-project?template=194&from=solution.
Select the "QwQ‑32B" template, choose a region (e.g., China Beijing 2), keep default settings, and click Deploy Project .
Confirm the deployment; the provisioning takes about 10–12 minutes.
2. Locate Access URL
After deployment, open the CAP project details page and copy the generated HTTP endpoint.
Open the endpoint in a browser; the Open WebUI interface appears.
3. Test the Model
Enter a prompt such as "Who are you?" in the text box and submit. The model returns a response.
4. Adjust Ollama Scaling
In the CAP console, edit the Ollama function configuration and increase the reserved instance count to enable horizontal scaling.
5. Access via Chatbox Client
Copy the Ollama API endpoint shown in the CAP console.
Download the Chatbox client (https://chatboxai.app/zh#download) and install it (the screenshots show macOS M3).
In Chatbox, open Settings and configure:
Save the configuration.
Now you can type queries (e.g., "Who are you?" ) in Chatbox and receive responses from the deployed QwQ‑32B model.
Resource Cleanup
When the demo is finished, delete the CAP project to stop incurring charges:
Log in to the CAP console (https://cap.console.aliyun.com/).
Navigate to Projects , locate the deployed project, click Delete , and confirm the deletion.
Cost Considerations
The free trial quotas for Function Compute and NAS are sufficient for the steps in this guide. If the trial limits are exceeded, the estimated cost is below ¥9 per hour, but actual charges depend on the number of GPU instances and runtime duration. Always verify the final bill in the Alibaba Cloud console.
Reference URLs
Account registration: https://account.aliyun.com/register/qr_register.htm
Function Compute console: https://fcnext.console.aliyun.com/
Free trial details: https://help.aliyun.com/document_detail/2665971.html
NAS file storage: https://free.aliyun.com/?searchKey=%E6%96%87%E4%BB%B6%E5%AD%98%E5%82%A8+NAS
CAP project template: https://cap.console.aliyun.com/create-project?template=194&from=solution
Chatbox download: https://chatboxai.app/zh#download
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
