5 min read

How Qwen3.6‑35B‑A3B Matches Dense Models with Only 30 B Active Parameters

The article analyzes Qwen3.6‑35B‑A3B’s MoE architecture, showing how its 30 B active parameters outperform larger dense models across programming, agent, and multimodal benchmarks, and examines the flagship Qwen3.6‑Max‑Preview’s substantial gains in world knowledge, instruction following, and third‑party rankings.

SuanNi

Apr 21, 2026

How Qwen3.6‑35B‑A3B Matches Dense Models with Only 30 B Active Parameters

Qwen3.6‑35B‑A3B: Small‑parameter, high‑capacity MoE model

Qwen3.6‑35B‑A3B is a Mixture‑of‑Experts (MoE) transformer with 350 B total parameters. During inference only a subset of experts is selected, activating roughly 30 B parameters per token. The routing layer uses a top‑2 gating function with a load‑balancing loss to distribute tokens evenly across 64 expert feed‑forward networks. This design reduces memory bandwidth and improves the energy‑efficiency ratio by an order of magnitude compared with dense counterparts.

Benchmark results (average of three runs, batch size 1, A100 40 GB):

Natural‑language programming: surpasses dense Qwen3.5‑27B (270 B parameters) on HumanEval + MBPP by 3.2% and 2.8% respectively.

Agent‑programming: outperforms Qwen3.5‑35B‑A3B on SkillsBench (+9.9 points), SciCode (+10.8), NL2Repo (+5.0) and Terminal‑Bench 2.0 (+3.8).

Multimodal vision‑language: RefCOCO = 92.0, ODInW13 = 50.8, matching Claude Sonnet 4.5 despite only 30 B active parameters.

Comparison with dense Gemma 4‑31B shows comparable scores on CodeXGLUE and MMLU while using ≈ 1/10 of the active parameter count.

These results demonstrate that the lightweight MoE architecture delivers dense‑model quality for developers with limited compute budgets.

Qwen3.6‑Max‑Preview: Flagship model leading the domestic leaderboard

Qwen3.6‑Max‑Preview builds on the Qwen3.6‑Plus architecture and expands the context window to 64 k tokens. The model size (exact parameter count not disclosed) is increased to improve world‑knowledge coverage and instruction compliance.

Key improvements over Qwen3.6‑Plus (measured on the Artificial Analysis third‑party leaderboard, version 2024‑04):

SkillsBench + 9.9 points, SciCode + 10.8, NL2Repo + 5.0, Terminal‑Bench 2.0 + 3.8 → stronger code generation and terminal manipulation.

SuperGPQA + 2.3, QwenChineseBench + 5.3 → broader and deeper factual knowledge.

ToolcallFormatIFBench + 2.8 → more rigorous tool‑calling output formatting.

In the Artificial Analysis ranking, Qwen3.6‑Max‑Preview ranks above GLM 5.1 and MiniMax‑M2.7, making it the top domestic model at the time of writing.