Qwen 3.5 Launch: 17B Active Parameters Take on GPT‑5.2

Qwen 3.5, an open‑source 397B‑parameter model that activates only 17B parameters, uses a hybrid MoE‑Gated Delta architecture, offers native multimodal support and a default chain‑of‑thought mode, and achieves benchmark scores comparable to GPT‑5.2, Claude 4.5 Opus and Gemini 3 Pro across code, math, agent and vision tasks.

Node.js Tech Stack
Node.js Tech Stack
Node.js Tech Stack
Qwen 3.5 Launch: 17B Active Parameters Take on GPT‑5.2

Release Overview

On Chinese New Year's Eve, the Qwen team announced the official release of Qwen 3.5, a 397 billion‑parameter model (Qwen3.5‑397B‑A17B) that requires only 17 billion active parameters during inference.

Core Highlights: Efficiency and Design

1. Extreme Efficiency: 397B backbone, 17B active core

Qwen 3.5 adopts a mixed‑expert (MoE) architecture combined with Gated Delta Networks. The total parameter count is 397 B, but only 17 B are activated for a given query, analogous to a 397‑expert team where only 17 relevant experts respond, delivering fast inference and low cost.

2. Native Multimodal Capability

Unlike previous “language model + vision encoder” hybrids, Qwen 3.5 integrates a Unified Vision‑Language Foundation that fuses multimodal data during early training, enabling genuine visual understanding in tasks such as image reasoning, chart interpretation and agent‑based scenarios.

3. Default Thinking Mode

Inspired by Qwen3‑Max‑Thinking, Qwen 3.5 automatically activates a chain‑of‑thought (CoT) mode. Before answering complex questions, it emits a <think>...</think> block that records its reasoning steps, improving answer accuracy and depth.

Benchmark Performance

Official evaluations show Qwen 3.5 competing closely with leading closed‑source models:

Code (SWE‑bench Verified): 76.4 % – surpasses Qwen3‑Max‑Thinking (75.3) and Gemini 3 Pro (76.2), trailing GPT‑5.2 (80.0).

Math (AIME 2026): 91.3 % accuracy, comparable to Claude 4.5 Opus (93.3).

Agent (BFCL V4): 72.9 % on tool use and API calls, outperforming Qwen3‑Max‑Thinking and GPT‑5.2.

Vision (MMMU‑Pro): 79.0 % on visual reasoning, nearly matching GPT‑5.2 (79.5).

Front‑End Demo: One‑Line 3D Car Game

In a front‑end showcase, a single natural‑language prompt generated a fully functional 3D racing game, complete with a real‑time scoring system, lap counter, millisecond timer and dynamic speedometer.

Full‑Stack Agent: Think, Search, Create

The model can act as an “all‑round intern”, simultaneously reasoning, searching over 18+ web sources, and generating slide decks. Screenshots illustrate real‑time citation lists and generated PPT content.

Technical Deep Dive

Efficient Hybrid Architecture: Gated DeltaNet plus sparse MoE yields high throughput and low latency.

Scalable RL Generalization: Trained on millions of agent environments to improve real‑world task handling.

Global Language Coverage: Supports 201 languages and dialects, enhancing cultural comprehension.

Getting Started for Developers

Model weights are available on Hugging Face and can be loaded directly with Transformers. For production‑grade inference, vLLM or SGLang are recommended.

# Deploy with vLLM
vllm serve Qwen/Qwen3.5-397B-A17B --tensor-parallel-size 8 --max-model-len 262144

The default context window is 262 k tokens (expandable to 1 M), requiring sufficient GPU memory.

Conclusion

The release demonstrates that open‑source models are rapidly closing the gap with top proprietary systems, offering private deployment, controllable inference cost, and capabilities on par with GPT‑5.2.

Qwen 3.5 Benchmark Comparison
Qwen 3.5 Benchmark Comparison
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

vLLMbenchmarkAI modelMoEmultimodalGated Delta NetworksQwen 3.5
Node.js Tech Stack
Written by

Node.js Tech Stack

Focused on sharing AI, programming, and overseas expansion

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.