Artificial Intelligence 15 min read

Alibaba’s Qwen 3.5‑Plus: 397 B Open‑Source Model Beats Gemini‑3 and GPT‑5.2 at Low Cost

Alibaba released the Qwen 3.5‑Plus open‑source large model (397 B total parameters, 170 B active) that outperforms top closed‑source models such as Gemini‑3‑Pro and GPT‑5.2 on multiple benchmarks, offers native multimodal understanding, supports 201 languages, reduces deployment memory by 60 % and inference latency by up to 19×, and is priced at only 0.8 CNY per million tokens.

Machine Learning Algorithms & Natural Language Processing

Feb 16, 2026

Alibaba’s Qwen 3.5‑Plus: 397 B Open‑Source Model Beats Gemini‑3 and GPT‑5.2 at Low Cost

On Chinese New Year’s Eve Alibaba unveiled Qwen 3.5‑Plus, the newest generation of its open‑source large language model series. The model has a total parameter count of 397 B but activates only 170 B (≈5 % of the total) during inference, allowing it to outperform trillion‑parameter closed‑source models such as Gemini‑3‑Pro and GPT‑5.2 while keeping deployment costs low.

Benchmark performance shows Qwen 3.5‑Plus leading across several core dimensions. On the MMLU‑Pro test it scores 87.8, surpassing GPT‑5.2; on the GPQA scientific‑reasoning benchmark it reaches 88.4, higher than Claude 4.5; and it leads the IFBench instruction‑following leaderboard with 76.5 points, setting a new record. In multimodal evaluations—including MathVison (multimodal reasoning), RealWorldQA (visual QA), CC_OCR (text recognition), RefCOCO‑avg (spatial intelligence) and MLVU (video understanding)—the model consistently exceeds Gemini‑3‑Pro.

Efficiency gains are achieved through four architectural innovations. First, a mixed‑attention mechanism allocates high‑precision computation to important tokens while processing less‑relevant tokens sparsely, reducing the quadratic cost of self‑attention. Second, an extreme‑sparse Mixture‑of‑Experts (MoE) design activates only the most relevant expert sub‑networks, cutting the activation ratio to under 5 %. Third, a native multi‑token prediction head enables batch generation of future tokens, roughly doubling inference speed. Fourth, a training‑stability gate (originally presented in a NeurIPS 2025 best‑paper) acts as an “intelligent switch” for attention outputs, mitigating gradient spikes and noise. Together these changes lower training cost by up to 90 % and halve activation memory via mixed‑precision (FP8/FP32) scheduling.

From an efficiency standpoint, Qwen 3.5‑Plus reduces deployment GPU memory by 60 % compared with Qwen 3‑Max and boosts maximum inference throughput by up to 19×. The API pricing reflects this advantage: input tokens cost 0.8 CNY per million, roughly 1/18 of Gemini‑3‑Pro’s price.

Multilingual and multimodal capabilities are expanded dramatically. The vocabulary grows from 150 k to 250 k tokens, and the model now covers 201 languages, improving low‑resource language encoding efficiency by 60 %. Native multimodal pathways are trained from day one on mixed text‑image data (36 T+ tokens), enabling seamless handling of images, video (up to 2 minutes of raw video with 1 M token context), and code‑visual integration. Demonstrations include a washing‑car decision query, image‑pattern reasoning, IMO‑level geometry solving, video summarisation, automatic generation of front‑end code from hand‑drawn sketches, and end‑to‑end website creation from local video files.

The release is positioned as the first of Alibaba’s “Spring Festival” model launches. Subsequent Qwen 3.5 variants will be open‑sourced for on‑premise, edge, and cloud deployment, and a larger Qwen 3.5‑Max is slated for a post‑New‑Year release. By providing a high‑performance, low‑cost, openly available model, Alibaba aims to democratise state‑of‑the‑art AI in the same way Linux and Android democratised operating systems.

AI Large Language Model benchmark multimodal open-source Qwen3.5

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.