Tagged articles

Tapered Language Model

1 articles · Page 1 of 1
Machine Heart
Machine Heart
Jun 29, 2026 · Artificial Intelligence

Re‑shaping Transformers: Moving Capacity Forward Makes LLMs Smarter

A new study shows that reallocating the feed‑forward network capacity toward the early layers of a Transformer—without adding parameters or FLOPs—lowers perplexity by up to 1.84 points, and the same technique improves performance across several modern LLM architectures.

FFN widthLanguage ModelTapered Language Model
0 likes · 9 min read
Re‑shaping Transformers: Moving Capacity Forward Makes LLMs Smarter