SuanNi
Jun 9, 2026 · Artificial Intelligence
How Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed Achieves 1 T‑Parameter, 1000 Tokens/s Generation
Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed delivers a 1‑trillion‑parameter model that generates over 1000 tokens per second on a standard 8‑GPU server by combining FP4 quantization, MoE architecture, DFlash decoding and TileRT’s custom execution engine, challenging the need for dedicated ASICs.
DFlashFP4 quantizationLarge Language Model
0 likes · 10 min read
