How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding

Keye‑VL‑1.5, an 8‑billion‑parameter multimodal large language model, introduces a Slow‑Fast video encoding strategy, a four‑stage progressive pre‑training pipeline with 128K context, and a sophisticated post‑training regime that together achieve state‑of‑the‑art performance on video and vision‑language benchmarks while maintaining strong general capabilities.

Large Language ModelPretrainingbenchmark

0 likes · 21 min read

How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding

Kuaishou Large Model

Sep 8, 2025 · Artificial Intelligence

Keye-VL-1.5-8B: The New Multimodal LLM That Beats GPT-4o on Vision Benchmarks

Kwai's newly released Keye-VL-1.5-8B multimodal large language model dramatically improves visual, reasoning, and temporal understanding, achieving top scores on public video benchmarks and surpassing closed‑source models like GPT‑4o, while offering an open‑source release and detailed technical documentation.

Vision-Languagebenchmark performancemultimodal LLM

0 likes · 11 min read

Keye-VL-1.5-8B: The New Multimodal LLM That Beats GPT-4o on Vision Benchmarks

Kuaishou Tech

Sep 5, 2025 · Artificial Intelligence

How Keye‑VL‑1.5‑8B Sets New Benchmarks in Multimodal AI

Fast‑search platform Kwai has open‑sourced the 8‑billion‑parameter multimodal LLM Keye‑VL‑1.5, which introduces a slow‑fast frame encoding, a progressive four‑stage pre‑training pipeline, and an automated data construction workflow, achieving state‑of‑the‑art results on video and vision‑language benchmarks and surpassing many closed‑source models.

Large Language Modelbenchmark performancemultimodal AI

0 likes · 12 min read

How Keye‑VL‑1.5‑8B Sets New Benchmarks in Multimodal AI