Tagged articles
1 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
Mar 22, 2023 · Artificial Intelligence

Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training

Tencent’s Star Network delivers a 1.6 Tbps Ethernet‑RDMA fabric, fat‑tree topology supporting up to 4 K GPUs, multi‑track traffic aggregation and adaptive heterogeneous links plus a custom TCCL library, cutting AllReduce overhead from 35 % to 3.7 %, speeding AI training iterations by 32 % while automating deployment and providing sub‑second self‑healing.

AI trainingGPU clustersRDMA
0 likes · 19 min read
Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training