Tagged articles
1 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 12, 2022 · Artificial Intelligence

How Whale Enables Efficient Giant Model Training on Heterogeneous GPUs

The article introduces Whale, an open‑source distributed training framework that unifies multiple parallelism strategies, uses hardware‑aware load balancing to accelerate giant models like BERT‑Large and the trillion‑parameter M6 on heterogeneous GPU clusters, and details its architecture, planning, and real‑world performance gains.

Deep LearningParallelismhardware-aware scheduling
0 likes · 11 min read
How Whale Enables Efficient Giant Model Training on Heterogeneous GPUs