Artificial Intelligence 4 min read

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

Baobao Algorithm Notes

Mar 6, 2025

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Overview

Alibaba released the open‑source inference model QwQ‑32B. The model has 32.5 billion parameters (31 billion non‑embedding) and aims to provide a lightweight alternative to 67 B‑scale models while matching the performance of state‑of‑the‑art models such as DeepSeek‑R1 (full‑size) and o1‑mini.

Key Highlights

Performance comparable to the most advanced inference models DeepSeek‑R1 (full‑strength, not distilled) and o1‑mini.

Integrated agent‑related capabilities that allow the model to use tools, perform critical thinking, and adjust its reasoning based on environmental feedback.

Compact size – only 32 B parameters, reducing the inference cost compared with 67 B‑scale models.

Performance Comparison

A chart compares QwQ‑32B with DeepSeek‑R1‑Distilled‑Qwen‑32B, DeepSeek‑R1‑Distilled‑Llama‑70B, o1‑mini, and the original DeepSeek‑R1.

Demo Access

Interactive demo is available at https://chat.qwen.ai/?models=Qwen2.5-Plus, which supports tool use, internet access, and agent capabilities.

Training Procedure

QwQ‑32B was built from a cold‑start checkpoint and refined with large‑scale reinforcement learning (RL):

Stage 1 focused on mathematics and programming. Correctness of generated answers was verified for math problems, and a code‑execution server evaluated generated code against test cases.

Stage 2 added a general‑ability RL phase using a universal reward model together with rule‑based validators. This short RL phase improved broad capabilities without noticeably harming performance on the earlier tasks.

Technical Specifications

Model type: Causal Language Model
Training stages: Pre‑training and post‑training (including supervised fine‑tuning and RL)
Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and QKV bias
Parameter count: 32.5 B (31.0 B non‑embedding)
Layers: 64
Attention heads (GQA): Q = 40, KV = 8
Context length: 131,072 tokens

Repository

Model weights and code are hosted at https://hf-mirror.com/Qwen/QwQ-32B.

Alibaba Transformer Large Language Model reinforcement learning agent capabilities

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.