How Xiaomi’s MiMo V2.5 Achieves 99% API Price Cut with Full‑Stack Inference Optimizations
The MiMo‑V2.5 series combines Hybrid Sliding‑Window Attention, Mixture‑of‑Experts and multimodal support with a complete redesign of KVCache management, tiered caching, prefix‑tree logic and scheduling, compressing KVCache to about one‑seventh of full‑attention models and delivering up to 40% faster Prefill, 30% lower TTFT and dramatically reduced inference costs that enable a 99% API price reduction.
