Tagged articles
2 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 16, 2026 · Artificial Intelligence

Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture

Token Superposition Training (TST) speeds up large‑language‑model pre‑training by up to 2.5× without altering model architecture or compute budget, using a superposition phase that averages token embeddings into bags and predicts groups of tokens, followed by a standard recovery phase, as demonstrated on 10B‑parameter MoE and smaller models.

LLM PretrainingMCE LossMoE
0 likes · 10 min read
Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture
AntTech
AntTech
May 24, 2022 · Artificial Intelligence

WPipe: Group‑Based Interleaved Pipeline Parallelism for Large‑Scale DNN Training

The paper introduces WPipe, a group‑based interleaved pipeline parallelism method that reduces memory overhead and weight‑update latency compared with PipeDream‑2BW, achieving up to 1.4× speed‑up and 36% lower memory usage while preserving model accuracy on large‑scale DNNs.

Deep LearningPipeline ParallelismTraining Throughput
0 likes · 13 min read
WPipe: Group‑Based Interleaved Pipeline Parallelism for Large‑Scale DNN Training