AIWalker
Feb 26, 2025 · Artificial Intelligence
Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap
The paper analytically identifies injectivity and local modeling as the two key factors causing the performance gap between linear and Softmax attention, proposes the InLine attention modifications to restore these properties, and demonstrates through extensive Vision Transformer experiments that the enhanced linear attention matches or surpasses Softmax while retaining linear computational cost.
Attention MechanismEfficient TransformersLinear Attention
0 likes · 24 min read
