Tagged articles
2 articles
Page 1 of 1
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Feb 21, 2025 · Artificial Intelligence

How NSA and MoE Are Shaping the Future of Large‑Model Development

The article examines Native Sparse Attention (NSA) and Mixture‑of‑Experts (MoE) as complementary innovations that improve data quality, model architecture, and inference efficiency for large models, while also discussing their challenges and potential research directions.

Large ModelsMixture of ExpertsNative Sparse Attention
0 likes · 11 min read
How NSA and MoE Are Shaping the Future of Large‑Model Development
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Feb 18, 2025 · Artificial Intelligence

DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling

The DeepSeek paper on Native Sparse Attention (NSA) presents a hardware‑aligned, trainable sparse‑attention architecture that slashes O(n²) costs, delivers up to 11.6× speedups and 2.3‑point accuracy gains on long‑context benchmarks, and reduces training expense by 47% while scaling to 64k tokens.

DeepSeekGPU optimizationLong-context modeling
0 likes · 11 min read
DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling