Architects' Tech Alliance
Feb 24, 2025 · Artificial Intelligence
NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington
The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.
AI architectureDeepSeekSparse Attention
0 likes · 5 min read