Keye-VL-2.0 Brings DeepSeek Sparse Attention to Multimodal AI – Report Released
Keye‑VL‑2.0, an open‑source MoE multimodal foundation model, tackles hour‑level video understanding and agentic intelligence by embedding DeepSeek Sparse Attention into a GQA‑based architecture, enabling near‑lossless 256 K token context, four‑stage pre‑training, diverse RL distillation techniques, and achieving state‑of‑the‑art results on long‑video benchmarks, with weights publicly released.
