Artificial Intelligence 6 min read

Deploy Alibaba’s Qwen3 LLM in 10 Minutes with Bailei Platform

Learn how to quickly set up Alibaba Cloud’s Bailei platform to call the open-source Qwen3 large language model, explore its cost‑effective performance, dual‑mode reasoning, multilingual support, and enhanced agent capabilities, and follow step‑by‑step instructions for API key configuration, Cherry Studio integration, and tool‑calling setup.

Alibaba Cloud Developer

May 14, 2025

Deploy Alibaba’s Qwen3 LLM in 10 Minutes with Bailei Platform

Model Highlights

Inference capability significantly enhanced: flagship Qwen3-235B-A22B matches top models in code, mathematics, and general tasks.

Medium MoE model Qwen3-30B-A3B outperforms QwQ-32B.

Small model Qwen3-4B rivals Qwen2.5-72B-Instruct.

Seamless Switching Between Two Modes

Thinking mode analyzes step‑by‑step, suitable for complex problems.

Non‑thinking mode responds instantly, suitable for simple queries.

A single model supports both modes, eliminating the need to deploy multiple models.

Multilingual Support Expansion

Supports 119 languages and dialects, covering major global languages.

Agent Capability Enhancement

Optimized agent and code abilities, native MCP support for more precise tool invocation.

Solution Architecture

Alibaba Cloud Bailei platform provides standardized APIs, removing the need to build model service infrastructure, and supports load balancing and auto‑scaling for stable API calls. Combined with Cherry Studio visual client, users can switch Qwen3’s thinking mode and use tool calls without command‑line operations.

After configuration, a local runtime environment as shown below is created.

Practical Deployment

Obtain Bailei API‑KEY: go to the Bailei console, click “View” in the API Key column to retrieve the key.

Download Cherry Studio client from the provided link and install it.

Configure the API in Cherry Studio: click the settings button, select “Alibaba Cloud Bailei” under Model Service, and enter the API Key and endpoint https://dashscope.aliyuncs.com/compatible-mode/v1/.

Enter the desired Qwen3 model ID (e.g., qwen3-235b-a22b) or any other Qwen3 model.

Model Experience

Quickly try Qwen3: in the chat interface select the model, then use the prompt suffix /no_think to disable thinking mode, or /think to enable it.

Tool Calling Capability

Qwen3’s tool‑calling is greatly improved, especially for MCP. Example: integrate ModelScope’s Fetch web‑page tool via an SSE URL.

Configure an MCP server in Cherry Studio with name “Fetch网页内容抓取”, type “Server‑Sent Events (sse)”, and the URL https://mcp.api-inference.modelscope.cn/sse/xxx.

After saving, activate the MCP server and ask questions such as “Please fetch this page and answer: which Qwen3 models exist?” The model will retrieve the page content and respond accurately.

Resource Cleanup

To delete an API Key, go to the API Key management page, locate the target key, and remove it; the key will no longer work for Bailei model calls.

Enjoy building and testing Qwen3!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MLOps AI Deployment Alibaba Cloud Tool Calling Qwen3

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.