Data Party THU
Oct 21, 2025 · Artificial Intelligence
Can Linear‑Time LSTMs Beat Transformers? Scaling Laws Reveal the Answer
The paper presents a systematic scaling‑law study of the linear‑time xLSTM architecture versus quadratic‑time Transformers, evaluating parameter‑data loss surfaces, optimal model size under equal FLOP budgets, and inference latency components, and shows that xLSTM consistently offers better cost‑effectiveness across diverse contexts and budgets.
Inference OptimizationLinear Time ComplexityTransformer
0 likes · 11 min read
