Machine Learning Algorithms & Natural Language Processing
Apr 22, 2026 · Artificial Intelligence
Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost
The article presents a two‑step cross‑architecture distillation method that replaces the quadratic softmax attention of Transformers with a learned linear attention and then maps it onto a Mamba backbone, achieving near‑teacher performance while reducing inference cost to linear time.
Cross‑ArchitectureDistillationLinear Attention
0 likes · 8 min read
