How STEM Replaces MoE Routing with Simple Table Lookup for Faster Transformers
The article presents STEM, a method that transforms dense and MoE transformer architectures by converting the expert routing step into a static table‑lookup operation, achieving higher parameter efficiency, lower communication overhead, and improved interpretability while maintaining or boosting downstream task performance.
