Machine Learning Algorithms & Natural Language Processing
Mar 10, 2026 · Artificial Intelligence
How InfLLM‑V2 Achieves Seamless Short‑to‑Long Context Upgrade with Minimal Structural Changes
InfLLM‑V2 introduces a dense‑sparse switchable attention framework that preserves the original dense‑attention parameters while enabling efficient long‑context training, matching full‑attention performance on benchmarks such as RULER, LongBench, and chain‑reasoning tasks, and delivering up to 2.3× end‑to‑end inference speedup without degrading short‑sequence abilities.
InfLLM-V2Transformerdense-sparse attention
0 likes · 16 min read
