Tagged articles
2 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 16, 2026 · Artificial Intelligence

Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture

Token Superposition Training (TST) speeds up large‑language‑model pre‑training by up to 2.5× without altering model architecture or compute budget, using a superposition phase that averages token embeddings into bags and predicts groups of tokens, followed by a standard recovery phase, as demonstrated on 10B‑parameter MoE and smaller models.

LLM PretrainingMCE LossMoE
0 likes · 10 min read
Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture