Boosting LLM Pre‑training 2.5× Without Architecture Changes or Extra Compute
Nous Research introduces Token Superposition Training, which groups tokens into bags, averages their embeddings, and predicts token groups without altering model architecture or adding compute, achieving up to 2.5× faster pre‑training while maintaining standard inference.
