Artificial Intelligence 22 min read

Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides

This article introduces the Qwen3 family of open‑source large language models, details their architecture, parameter counts, multilingual support, and benchmark performance, and provides step‑by‑step instructions for deploying them with frameworks like SGLang, vLLM, and local runtimes such as Ollama and LMStudio.

JavaEdge

May 2, 2025

Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides

Introduction

Qwen3 is the latest open‑source series of large language models released by Alibaba. The flagship model Qwen3‑235B‑A22B achieves competitive results on coding, mathematics, and general‑purpose benchmarks compared with top models such as DeepSeek‑R1, o1, Grok‑3 and Gemini‑2.5‑Pro. A smaller Mixture‑of‑Experts (MoE) model Qwen3‑30B‑A3B uses only 10% of the activation parameters of a 32B model while outperforming it, and even the 4B variant rivals the performance of Qwen2.5‑72B‑Instruct.

Model Overview

The released models include:

Qwen3‑235B‑A22B : 235 billion total parameters, 220 billion activation parameters.

Qwen3‑30B‑A3B : ~300 billion total parameters, 30 billion activation parameters (MoE).

Six dense models: Qwen3‑32B, Qwen3‑14B, Qwen3‑8B, Qwen3‑4B, Qwen3‑1.7B, Qwen3‑0.6B, all released under the Apache 2.0 license.

Models    Layers  Heads(Q/KV)  Tie Embedding  Context Length
Qwen3‑0.6B   28      16/8          Yes            32K
Qwen3‑1.7B   28      16/8          Yes            32K
Qwen3‑4B    36      32/8          Yes            32K
Qwen3‑8B    36      32/8          No             128K
Qwen3‑14B   40      40/8          No             128K
Qwen3‑32B   64      64/8          No             128K
Qwen3‑30B‑A3B 48    32/4          128/8          128K
Qwen3‑235B‑A22B 94   64/4          128/8          128K

All models are available on Hugging Face, ModelScope and Kaggle for immediate use.

Deployment Recommendations

For serving the models, the article recommends the SGLang and vLLM frameworks, both of which provide OpenAI‑compatible endpoints and support the model’s reasoning mode.

Local Usage

Local runtimes such as Ollama , LMStudio , MLX , llama.cpp and KTransformers can run Qwen3 models for research, development, or production workloads.

Code Example – Transformers

from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-30B-A3B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # switch between thinking and non‑thinking modes
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# generate text
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# split thinking and final content
try:
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("
")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("
")
print("thinking content:", thinking_content)
print("content:", content)

To disable the reasoning mode, set enable_thinking=False in apply_chat_template.

Serving with SGLang

python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B --reasoning-parser qwen3

Serving with vLLM

vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1

Removing the --reasoning-parser (and --enable-reasoning) disables the thinking mode.

Local Development Commands

ollama run qwen3:30b-a3b

Similar commands work with LMStudio, llama.cpp, or KTransformers.

Advanced Usage – Dynamic Thinking Switch

When enable_thinking=True, the model can be toggled per turn using the special tokens /think and /no_think in user or system messages. The most recent directive controls the model’s behavior.

Agent Example with Qwen‑Agent

Qwen3’s tool‑calling capabilities are exposed through the Qwen‑Agent library. The following snippet shows how to configure the LLM, define tools (time, fetch, code interpreter), and run a multi‑turn conversation that fetches a blog URL.

from qwen_agent.agents import Assistant
llm_cfg = {
    'model': 'Qwen3-30B-A3B',
    'model_server': 'http://localhost:8000/v1',  # OpenAI‑compatible endpoint
    'api_key': 'EMPTY'
}
# Define tools (MCP servers for time and fetch, plus built‑in code interpreter)
tools = [
    {'mcpServers': {
        'time': {'command': 'uvx', 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']},
        'fetch': {'command': 'uvx', 'args': ['mcp-server-fetch']}
    }},
    'code_interpreter'
]
bot = Assistant(llm=llm_cfg, function_list=tools)
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

The assistant first calls the fetch-fetch tool to retrieve the blog content, then processes the result and generates a structured summary of the latest Qwen releases.

Future Directions

Qwen3 is positioned as a milestone toward artificial general intelligence (AGI) and super‑intelligence (ASI). Future work will expand data scale, model size, context length, and modality coverage, and shift focus from pure model training to training agents that can reason over long horizons using reinforcement learning.

Conclusion

Qwen3’s open‑source release provides a diverse set of dense and MoE models, multilingual capabilities (119 languages), flexible reasoning modes, and extensive tooling for deployment and agent development, empowering researchers and developers to build innovative AI solutions.

AI Agent large language model open-source Qwen3

Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.