Tagged articles
1 articles
Page 1 of 1
SuanNi
SuanNi
Jun 9, 2026 · Artificial Intelligence

How Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed Achieves 1 T‑Parameter, 1000 Tokens/s Generation

Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed delivers a 1‑trillion‑parameter model that generates over 1000 tokens per second on a standard 8‑GPU server by combining FP4 quantization, MoE architecture, DFlash decoding and TileRT’s custom execution engine, challenging the need for dedicated ASICs.

DFlashFP4 quantizationLarge Language Model
0 likes · 10 min read
How Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed Achieves 1 T‑Parameter, 1000 Tokens/s Generation