Baidu Geek Talk
May 19, 2025 · Artificial Intelligence
How Baidu Cloud Achieved 4µs Low-Latency PD Inference with HPN Network Optimizations
To meet the demanding network requirements of large‑scale PD‑separated inference, Baidu Cloud built a 4 µs end‑to‑end low‑latency HPN cluster, optimized traffic management, adaptive routing, and custom Alltoall operators, resulting in up to 20 % throughput gains and reduced latency for both Prefill and Decode stages.
AI inferenceAlltoall optimizationDistributed Training
0 likes · 14 min read
