Tagged articles
3 articles
Page 1 of 1
SuanNi
SuanNi
Jun 9, 2026 · Artificial Intelligence

How Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed Achieves 1 T‑Parameter, 1000 Tokens/s Generation

Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed delivers a 1‑trillion‑parameter model that generates over 1000 tokens per second on a standard 8‑GPU server by combining FP4 quantization, MoE architecture, DFlash decoding and TileRT’s custom execution engine, challenging the need for dedicated ASICs.

DFlashFP4 quantizationLarge Language Model
0 likes · 10 min read
How Xiaomi’s MiMo‑V2.5‑Pro UltraSpeed Achieves 1 T‑Parameter, 1000 Tokens/s Generation
Architect's Guide
Architect's Guide
May 29, 2026 · Artificial Intelligence

What Makes DeepSeek V4 Different? A Deep Technical Dive into Its Innovations

DeepSeek V4 introduces a suite of architectural breakthroughs—including mixed‑expert MoE, manifold‑constrained hyper‑connections, CSA/HCA hybrid attention, and FP4 quantization—that slash inference cost by up to tenfold while delivering million‑token context, competitive benchmarks, dual model variants, and a disruptive pricing strategy.

AI Model BenchmarkDeepSeek V4Efficient Attention
0 likes · 41 min read
What Makes DeepSeek V4 Different? A Deep Technical Dive into Its Innovations
DeepHub IMBA
DeepHub IMBA
Apr 27, 2026 · Artificial Intelligence

DeepSeek‑V4 Deep Dive: Engineering Million‑Token Context Efficiency

The article provides a thorough technical analysis of DeepSeek‑V4, detailing how mixed sparse attention (CSA + HCA), manifold‑constrained hyper‑connections, the Muon optimizer, FP4 quantization, and a suite of infrastructure tricks enable stable training and inference with up to one‑million token contexts while achieving state‑of‑the‑art benchmark results.

CSADeepSeek V4FP4 quantization
0 likes · 22 min read
DeepSeek‑V4 Deep Dive: Engineering Million‑Token Context Efficiency