Tagged articles
2 articles
Page 1 of 1
AI Frontier Lectures
AI Frontier Lectures
Jun 20, 2025 · Artificial Intelligence

How GCA Achieves 1000× Length Generalization in Large Language Models

Ant Research introduces GCA, a causal retrieval‑based grouped cross‑attention mechanism that end‑to‑end learns to fetch relevant past chunks, dramatically reducing memory usage and achieving over 1000× length generalization on long‑context language modeling tasks, with near‑constant inference memory and linear training cost.

AI researchGrouped Cross AttentionLLM efficiency
0 likes · 11 min read
How GCA Achieves 1000× Length Generalization in Large Language Models
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Feb 18, 2025 · Artificial Intelligence

DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling

The DeepSeek paper on Native Sparse Attention (NSA) presents a hardware‑aligned, trainable sparse‑attention architecture that slashes O(n²) costs, delivers up to 11.6× speedups and 2.3‑point accuracy gains on long‑context benchmarks, and reduces training expense by 47% while scaling to 64k tokens.

DeepSeekGPU optimizationLong-context modeling
0 likes · 11 min read
DeepSeek R1’s Disruptive Breakthrough: Native Sparse Attention Redefines Long‑Context Modeling