Tagged articles
1 articles
Page 1 of 1
Data Party THU
Data Party THU
Oct 21, 2025 · Artificial Intelligence

Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer

The paper presents a systematic scaling‑law study of the linear‑time xLSTM architecture versus quadratic‑time Transformers, evaluating parameter‑data loss surfaces, optimal model size under equal FLOP budgets, and inference latency components, and shows that xLSTM consistently offers better cost‑effectiveness across diverse contexts and budgets.

Inference OptimizationLinear Time ComplexityTransformer
0 likes · 11 min read
Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer