Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities
Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.
Overview
Alibaba released the open‑source inference model QwQ‑32B. The model has 32.5 billion parameters (31 billion non‑embedding) and aims to provide a lightweight alternative to 67 B‑scale models while matching the performance of state‑of‑the‑art models such as DeepSeek‑R1 (full‑size) and o1‑mini.
Key Highlights
Performance comparable to the most advanced inference models DeepSeek‑R1 (full‑strength, not distilled) and o1‑mini.
Integrated agent‑related capabilities that allow the model to use tools, perform critical thinking, and adjust its reasoning based on environmental feedback.
Compact size – only 32 B parameters, reducing the inference cost compared with 67 B‑scale models.
Performance Comparison
A chart compares QwQ‑32B with DeepSeek‑R1‑Distilled‑Qwen‑32B, DeepSeek‑R1‑Distilled‑Llama‑70B, o1‑mini, and the original DeepSeek‑R1.
Demo Access
Interactive demo is available at https://chat.qwen.ai/?models=Qwen2.5-Plus, which supports tool use, internet access, and agent capabilities.
Training Procedure
QwQ‑32B was built from a cold‑start checkpoint and refined with large‑scale reinforcement learning (RL):
Stage 1 focused on mathematics and programming. Correctness of generated answers was verified for math problems, and a code‑execution server evaluated generated code against test cases.
Stage 2 added a general‑ability RL phase using a universal reward model together with rule‑based validators. This short RL phase improved broad capabilities without noticeably harming performance on the earlier tasks.
Technical Specifications
Model type: Causal Language Model
Training stages: Pre‑training and post‑training (including supervised fine‑tuning and RL)
Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and QKV bias
Parameter count: 32.5 B (31.0 B non‑embedding)
Layers: 64
Attention heads (GQA): Q = 40, KV = 8
Context length: 131,072 tokensRepository
Model weights and code are hosted at https://hf-mirror.com/Qwen/QwQ-32B.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
