Didi Tech
Jun 8, 2018 · Artificial Intelligence
DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning
DiDi PS is a custom RDMA‑based parameter server that uses a ring topology and optimized ibverbs communication to dramatically accelerate distributed deep‑learning training, consistently outperforming OpenMPI, NCCL2, TensorFlow’s built‑in RDMA, and Horovod while providing more stable and scalable synchronization for massive data workloads.
AllreduceDeep LearningRDMA
0 likes · 10 min read