InfLLM-V2 — 2 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Mar 10, 2026 · Artificial Intelligence

How InfLLM‑V2 Achieves Seamless Short‑to‑Long Context Upgrade with Minimal Structural Changes

InfLLM‑V2 introduces a dense‑sparse switchable attention framework that preserves the original dense‑attention parameters while enabling efficient long‑context training, matching full‑attention performance on benchmarks such as RULER, LongBench, and chain‑reasoning tasks, and delivering up to 2.3× end‑to‑end inference speedup without degrading short‑sequence abilities.

InfLLM-V2Transformerdense-sparse attention

0 likes · 16 min read

How InfLLM‑V2 Achieves Seamless Short‑to‑Long Context Upgrade with Minimal Structural Changes

Data Party THU

Oct 25, 2025 · Artificial Intelligence

How InfLLM‑V2 Delivers Fast, Low‑Cost Sparse Attention for Long‑Context LLMs

InfLLM‑V2 introduces a zero‑parameter, train‑efficient sparse‑attention framework that dramatically speeds up long‑sequence processing while requiring only 5 B tokens for training, and the open‑source MiniCPM4.1 model demonstrates comparable performance to dense attention on both long‑text understanding and deep‑thinking benchmarks.

InfLLM-V2MiniCPM4.1Sparse Attention

0 likes · 10 min read

How InfLLM‑V2 Delivers Fast, Low‑Cost Sparse Attention for Long‑Context LLMs