Baidu HPN Network: Solving Hash Collision for 95% Physical Network Bandwidth Efficiency in Large Model Training
Baidu's HPN network solves hash‑collision bottlenecks in large‑model training by combining TOR‑affinity scheduling with Dynamic Load Balancing on self‑developed switches, boosting physical network bandwidth efficiency to about 95%, improving throughput by roughly 10% and adding a further 1.5% training‑speed gain via the BCCL library.