Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 10, 2026 · Artificial Intelligence

How InfLLM‑V2 Achieves Seamless Short‑to‑Long Context Upgrade with Minimal Structural Changes

InfLLM‑V2 introduces a dense‑sparse switchable attention framework that preserves the original dense‑attention parameters while enabling efficient long‑context training, matching full‑attention performance on benchmarks such as RULER, LongBench, and chain‑reasoning tasks, and delivering up to 2.3× end‑to‑end inference speedup without degrading short‑sequence abilities.

InfLLM-V2Transformerdense-sparse attention
0 likes · 16 min read
How InfLLM‑V2 Achieves Seamless Short‑to‑Long Context Upgrade with Minimal Structural Changes
Data Party THU
Data Party THU
Oct 25, 2025 · Artificial Intelligence

How InfLLM‑V2 Delivers Fast, Low‑Cost Sparse Attention for Long‑Context LLMs

InfLLM‑V2 introduces a zero‑parameter, train‑efficient sparse‑attention framework that dramatically speeds up long‑sequence processing while requiring only 5 B tokens for training, and the open‑source MiniCPM4.1 model demonstrates comparable performance to dense attention on both long‑text understanding and deep‑thinking benchmarks.

InfLLM-V2MiniCPM4.1Sparse Attention
0 likes · 10 min read
How InfLLM‑V2 Delivers Fast, Low‑Cost Sparse Attention for Long‑Context LLMs