Machine Heart
Jun 29, 2026 · Artificial Intelligence
Re‑shaping Transformers: Moving Capacity Forward Makes LLMs Smarter
A new study shows that reallocating the feed‑forward network capacity toward the early layers of a Transformer—without adding parameters or FLOPs—lowers perplexity by up to 1.84 points, and the same technique improves performance across several modern LLM architectures.
FFN widthLanguage ModelTapered Language Model
0 likes · 9 min read
