Tagged articles

GPU Communication Overlap

1 articles · Page 1 of 1

May 17, 2026 · Artificial Intelligence

How DeepSeek Leverages MoE Parallelism: GPU Compute and Communication Optimizations

The article dissects DeepSeek's MoE model‑parallel strategy, explaining how GPU compute and communication are overlapped through expert, pipeline, and ZeRO‑1 parallelism, and introduces DualPipe and Waved‑EP kernels that enable efficient training on large‑scale hardware.

DeepSeekGPU Communication OverlapMixture of Experts

0 likes · 18 min read

How DeepSeek Leverages MoE Parallelism: GPU Compute and Communication Optimizations