Old Zhang's AI Learning
May 11, 2026 · Artificial Intelligence
Open‑Source Qwen3.6‑35B‑A3B Runs at 162 tok/s on a Single RTX 5090
The article introduces the open‑source Qwen3.6‑35B‑A3B model, explains its MoE architecture, three‑stage LoRA fine‑tuning, shows benchmark results where it achieves 161.9 tok/s on an RTX 5090—2.6× faster than a dense 27B counterpart—and discusses deployment tips, quantized GGUF release, and known compatibility pitfalls.
GGUF quantizationLoRA fine-tuningMixture of Experts
0 likes · 7 min read
