MiMo Flash V2 — 2 Technical Articles

Jan 7, 2026 · Artificial Intelligence

How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days

Within two days, Baidu’s Baige and Kunlun Chip teams adapted the 309‑billion‑parameter MiMo Flash V2 model—featuring a hybrid SWA+Sink and Full Attention mechanism—to run efficiently on the Kunlun P800 XPU using the vLLM‑Kunlun Plugin, achieving lossless performance comparable to GPU inference.

AI inferenceKunlun XPUMiMo Flash V2

0 likes · 7 min read

How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days

Baidu Intelligent Cloud Tech Hub

Jan 6, 2026 · Operations

How vLLM‑Kunlun Plugin Enabled Two‑Day Adaptation of MiMo Flash V2 on Kunlun P800 XPU

In just two days, Baidu Baige and Kunlun's engineers extended the vLLM‑Kunlun Plugin to overcome asymmetric KV dimensions and integrate SWA+Sink attention, achieving lossless, high‑performance inference of the MiMo Flash V2 model on the Kunlun P800 XPU.

Hybrid attentionKunlun P800MiMo Flash V2

0 likes · 8 min read

How vLLM‑Kunlun Plugin Enabled Two‑Day Adaptation of MiMo Flash V2 on Kunlun P800 XPU