AIWalker
Mar 14, 2025 · Artificial Intelligence
Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines
Researchers He Kaiming, Yann LeCun and colleagues propose a 9‑line Dynamic Tanh (DyT) layer that replaces LayerNorm/RMSNorm in Transformers, showing comparable or superior accuracy across vision, language, speech and DNA tasks while also reducing inference latency on modern GPUs.
AI researchDynamic TanhModel Efficiency
0 likes · 18 min read
