Tagged articles
3 articles
Page 1 of 1
Linux Kernel Journey
Linux Kernel Journey
May 8, 2025 · Artificial Intelligence

How Tencent’s TRMT Tech Delivered a Huge Speedup to DeepSeek’s Large‑Model Network

DeepSeek engineers highlighted Tencent’s open‑source TRMT and DeepEP contributions that boost GPU‑to‑GPU communication by up to 300%, double RoCE performance and add a further 30% gain on InfiniBand, while addressing lane‑utilization and CPU‑control bottlenecks through three targeted optimizations.

DeepEPDeepSeekGPU communication
0 likes · 6 min read
How Tencent’s TRMT Tech Delivered a Huge Speedup to DeepSeek’s Large‑Model Network
Tencent Tech
Tencent Tech
May 7, 2025 · Artificial Intelligence

How Tencent’s DeepEP Doubles GPU Communication Speed on RoCE Networks

Tencent engineers highlighted a massive speedup in DeepSeek’s open‑source DeepEP communication framework, revealing how their TRMT‑based optimizations—dynamic multi‑QP topology awareness, IBGDA‑driven CPU‑bypass, and atomic signaling—boost RoCE network throughput up to 300% and add another 30% gain when applied to InfiniBand, effectively doubling GPU communication performance for large AI models.

AI model trainingDeepEPGPU communication
0 likes · 8 min read
How Tencent’s DeepEP Doubles GPU Communication Speed on RoCE Networks
NewBeeNLP
NewBeeNLP
Feb 27, 2025 · Industry Insights

How DeepSeek’s Open‑Source Tools Exploit China‑Specific H800 GPUs to Boost AI Performance

The article analyzes DeepSeek’s three open‑source projects—FlashMLA, DeepEP, and DeepGEMM—showing how they optimize for the China‑only NVIDIA H800 GPU, contrast this with the abundant hardware resources of Western AI firms, and highlight the growing demand for talent that masters both AI models and GPU hardware.

AI hardwareDeepEPDeepGEMM
0 likes · 7 min read
How DeepSeek’s Open‑Source Tools Exploit China‑Specific H800 GPUs to Boost AI Performance