Kuaishou Large Model
Nov 22, 2024 · Artificial Intelligence
Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism
This article details a comprehensive set of techniques—including data‑ and tensor‑parallel overlap, context‑parallelism, activation rematerialization, and a performance‑driven cost model—that dramatically improve large‑language‑model training efficiency on ultra‑large GPU clusters while preserving model quality.
Parallelismactivation recomputationdistributed training
0 likes · 28 min read