Tagged articles

Expert Parallelism

6 articles · Page 1 of 1

Apr 23, 2026 · Artificial Intelligence

DeepSeek Unveils Tile Kernels and DeepEP V2 – Is V4 on the Horizon?

DeepSeek recently opened the Tile Kernels repository and released DeepEP V2, detailing new GPU kernel features, a fully JIT-enabled expert parallelism redesign that boosts peak performance by up to 1.3× while cutting SM usage fourfold, and hinting at an upcoming V4 release.

DeepEP V2DeepSeekExpert Parallelism

0 likes · 6 min read

DeepSeek Unveils Tile Kernels and DeepEP V2 – Is V4 on the Horizon?

IT Services Circle

Nov 28, 2025 · Artificial Intelligence

Unlocking AI Model Speed: How Data, Pipeline, Tensor & Expert Parallelism Work

AI model training relies on parallel computing, and this guide explains the four main parallelism strategies—Data Parallelism, Pipeline Parallelism, Tensor Parallelism, and Expert Parallelism—detailing their mechanisms, advantages, drawbacks, and how techniques like ZeRO and mixed 3D parallelism optimize memory and performance for massive models.

3D ParallelismAI parallelismExpert Parallelism

0 likes · 14 min read

Unlocking AI Model Speed: How Data, Pipeline, Tensor & Expert Parallelism Work

Alibaba Cloud Big Data AI Platform

Sep 25, 2025 · Artificial Intelligence

Unlocking Trillion‑Parameter MoE Models: Expert Parallelism and Alibaba Cloud PAI‑EAS Deployment Guide

This article explains the opportunities and challenges of Mixture of Experts (MoE) models, introduces expert parallelism as a solution to scaling and deployment bottlenecks, and provides a step‑by‑step guide for deploying MoE models with Alibaba Cloud PAI‑EAS, including configuration tips and code examples.

AI model deploymentExpert ParallelismLarge Language Model

0 likes · 11 min read

Unlocking Trillion‑Parameter MoE Models: Expert Parallelism and Alibaba Cloud PAI‑EAS Deployment Guide

Architect

May 26, 2025 · Artificial Intelligence

Parallelism Strategies for Large-Scale Model Training: Data, Tensor, Pipeline, Sequence, and Expert Parallelism

This article explains the memory limits of a single GPU and systematically introduces data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and expert parallelism, describing their communication costs, advantages, drawbacks, and practical implementation details for training large AI models.

AI trainingExpert Parallelismdata parallelism

0 likes · 14 min read

Parallelism Strategies for Large-Scale Model Training: Data, Tensor, Pipeline, Sequence, and Expert Parallelism

AI Algorithm Path

May 11, 2025 · Artificial Intelligence

How to Parallelize Ultra‑Large Model Training with PyTorch

The article explains the core concepts and trade‑offs of five parallelism techniques—data, tensor, context, pipeline, and expert parallelism—plus the ZeRO optimizer, showing when each method is appropriate for training ultra‑large PyTorch models and providing concrete code snippets and performance considerations.

Context ParallelismExpert ParallelismLarge‑Scale Training

0 likes · 21 min read

How to Parallelize Ultra‑Large Model Training with PyTorch

Baobao Algorithm Notes

Mar 13, 2025 · Artificial Intelligence

Why EP Outperforms TP for Deepseek V3/R1 Inference: Cost, Performance, and Reliability

This article analyzes Deepseek's EP‑based inference architecture for V3/R1 models, comparing it with TP, detailing how EP reduces memory and compute overhead, boosts batch size, cuts GPU memory usage, and introduces reliability, scalability, and maintainability challenges for large‑scale deployments.

AI infrastructureExpert ParallelismGPU memory optimization

0 likes · 18 min read

Why EP Outperforms TP for Deepseek V3/R1 Inference: Cost, Performance, and Reliability