Tagged articles

SubQ

2 articles · Page 1 of 1

May 16, 2026 · Artificial Intelligence

SubQ Beats Transformers: 12‑Million‑Token Context Model at Only 5% of Opus Cost

The article analyzes SubQ, a new LLM architecture using Subquadratic Sparse Attention (SSA) to achieve a 12‑million‑token context window with linear compute scaling, delivering up to 52× speedup and costing just 5% of Opus while matching dense‑attention performance on long‑context benchmarks.

SSASparse attentionSubQ

0 likes · 14 min read

SubQ Beats Transformers: 12‑Million‑Token Context Model at Only 5% of Opus Cost

Machine Heart

May 6, 2026 · Artificial Intelligence

Beyond Transformers: SubQ Achieves 12‑Million‑Token Context at Just 5% of Opus Cost

The SubQ model introduces Subquadratic Sparse Attention (SSA), a content‑dependent routing mechanism that reduces attention complexity to linear, enabling a 12‑million‑token context window with a 52.2× speedup and only 5% of Opus's cost, as demonstrated on MRCR v2, RULER, and SWE‑Bench benchmarks.

LLMLong ContextSparse attention

0 likes · 14 min read

Beyond Transformers: SubQ Achieves 12‑Million‑Token Context at Just 5% of Opus Cost