Alibaba Cloud Big Data AI Platform
Jul 12, 2022 · Artificial Intelligence
How Whale Enables Efficient Giant Model Training on Heterogeneous GPUs
The article introduces Whale, an open‑source distributed training framework that unifies multiple parallelism strategies, uses hardware‑aware load balancing to accelerate giant models like BERT‑Large and the trillion‑parameter M6 on heterogeneous GPU clusters, and details its architecture, planning, and real‑world performance gains.
Deep LearningParallelismhardware-aware scheduling
0 likes · 11 min read
