Tagged articles

Native Sparse Attention

2 articles · Page 1 of 1

Feb 21, 2025 · Artificial Intelligence

How NSA and MoE Are Shaping the Future of Large‑Model Development

The article examines Native Sparse Attention (NSA) and Mixture‑of‑Experts (MoE) as complementary innovations that improve data quality, model architecture, and inference efficiency for large models, while also discussing their challenges and potential research directions.

Large ModelsMixture of ExpertsModel Optimization

0 likes · 11 min read

How NSA and MoE Are Shaping the Future of Large‑Model Development

Software Engineering 3.0 Era

Feb 18, 2025 · Artificial Intelligence

DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling

The DeepSeek paper on Native Sparse Attention (NSA) presents a hardware‑aligned, trainable sparse‑attention architecture that slashes O(n²) costs, delivers up to 11.6× speedups and 2.3‑point accuracy gains on long‑context benchmarks, and reduces training expense by 47% while scaling to 64k tokens.

DeepSeekGPU OptimizationLong-context modeling

0 likes · 11 min read

DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling