Tencent Tech
May 7, 2025 · Artificial Intelligence
How Tencent’s DeepEP Doubles GPU Communication Speed on RoCE Networks
Tencent engineers highlighted a massive speedup in DeepSeek’s open‑source DeepEP communication framework, revealing how their TRMT‑based optimizations—dynamic multi‑QP topology awareness, IBGDA‑driven CPU‑bypass, and atomic signaling—boost RoCE network throughput up to 300% and add another 30% gain when applied to InfiniBand, effectively doubling GPU communication performance for large AI models.
AI model trainingDeepEPGPU communication
0 likes · 8 min read