Machine Heart
Machine Heart
Apr 23, 2026 · Artificial Intelligence

First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers

This survey reviews over 180 papers on the Attention Sink phenomenon in Transformers, outlining its three-stage evolution—from early exploitation to mechanistic interpretation and finally strategic mitigation—while detailing utilization tactics, theoretical explanations, removal techniques, and promising future research directions.

Attention SinkMitigationModel Interpretability
0 likes · 9 min read
First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 10, 2026 · Artificial Intelligence

Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics

The paper by Yann LeCun’s team reveals that massive activation spikes and attention sinks in Transformers are not inherently coupled; spikes arise from position‑0 token interactions and specific feed‑forward dynamics, while attention sinks emerge from Pre‑norm normalization and head dimension, offering practical insights for model quantization and long‑context inference.

Attention SinkLLMMassive Activations
0 likes · 9 min read
Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics