Deploy Alibaba’s Qwen3 LLM in 10 Minutes with Bailei Platform
Learn how to quickly set up Alibaba Cloud’s Bailei platform to call the open-source Qwen3 large language model, explore its cost‑effective performance, dual‑mode reasoning, multilingual support, and enhanced agent capabilities, and follow step‑by‑step instructions for API key configuration, Cherry Studio integration, and tool‑calling setup.
Model Highlights
Inference capability significantly enhanced: flagship Qwen3-235B-A22B matches top models in code, mathematics, and general tasks.
Medium MoE model Qwen3-30B-A3B outperforms QwQ-32B.
Small model Qwen3-4B rivals Qwen2.5-72B-Instruct.
Seamless Switching Between Two Modes
Thinking mode analyzes step‑by‑step, suitable for complex problems.
Non‑thinking mode responds instantly, suitable for simple queries.
A single model supports both modes, eliminating the need to deploy multiple models.
Multilingual Support Expansion
Supports 119 languages and dialects, covering major global languages.
Agent Capability Enhancement
Optimized agent and code abilities, native MCP support for more precise tool invocation.
Solution Architecture
Alibaba Cloud Bailei platform provides standardized APIs, removing the need to build model service infrastructure, and supports load balancing and auto‑scaling for stable API calls. Combined with Cherry Studio visual client, users can switch Qwen3’s thinking mode and use tool calls without command‑line operations.
After configuration, a local runtime environment as shown below is created.
Practical Deployment
Obtain Bailei API‑KEY: go to the Bailei console, click “View” in the API Key column to retrieve the key.
Download Cherry Studio client from the provided link and install it.
Configure the API in Cherry Studio: click the settings button, select “Alibaba Cloud Bailei” under Model Service, and enter the API Key and endpoint https://dashscope.aliyuncs.com/compatible-mode/v1/.
Enter the desired Qwen3 model ID (e.g., qwen3-235b-a22b) or any other Qwen3 model.
Model Experience
Quickly try Qwen3: in the chat interface select the model, then use the prompt suffix /no_think to disable thinking mode, or /think to enable it.
Tool Calling Capability
Qwen3’s tool‑calling is greatly improved, especially for MCP. Example: integrate ModelScope’s Fetch web‑page tool via an SSE URL.
Configure an MCP server in Cherry Studio with name “Fetch网页内容抓取”, type “Server‑Sent Events (sse)”, and the URL https://mcp.api-inference.modelscope.cn/sse/xxx.
After saving, activate the MCP server and ask questions such as “Please fetch this page and answer: which Qwen3 models exist?” The model will retrieve the page content and respond accurately.
Resource Cleanup
To delete an API Key, go to the API Key management page, locate the target key, and remove it; the key will no longer work for Bailei model calls.
Enjoy building and testing Qwen3!
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
