Tag

DLB Dynamic Load Balancing

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Jul 10, 2024 · Artificial Intelligence

Baidu HPN Network: Solving Hash Collision for 95% Physical Network Bandwidth Efficiency in Large Model Training

Baidu's HPN network solves hash‑collision bottlenecks in large‑model training by combining TOR‑affinity scheduling with Dynamic Load Balancing on self‑developed switches, boosting physical network bandwidth efficiency to about 95%, improving throughput by roughly 10% and adding a further 1.5% training‑speed gain via the BCCL library.

Baidu CloudCollective CommunicationDLB Dynamic Load Balancing
0 likes · 12 min read
Baidu HPN Network: Solving Hash Collision for 95% Physical Network Bandwidth Efficiency in Large Model Training