Bilibili Tech
Dec 31, 2024 · Cloud Computing
Design and Implementation of Bilibili AI Compute Network: Topology, Hardware Selection, Load Balancing, and Monitoring
Bilibili designed and deployed an AI compute network for large language model training, choosing a Fat-Tree topology, selecting high‑speed switches, optical modules, and fibers, implementing fixed‑path load balancing, and building a sub‑second telemetry monitoring platform, with plans to scale to ten‑thousand GPUs.
AI compute networkFat-Tree topologyhardware selection
0 likes · 17 min read