Tagged articles
2 articles
Page 1 of 1
AI Cyberspace
AI Cyberspace
Nov 19, 2025 · Artificial Intelligence

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

This article explains how AI model training has evolved from single‑GPU workloads to massive distributed training using MPI for CPU‑centric communication and NCCL for GPU‑centric communication, covering their histories, core concepts, programming interfaces, topology discovery, protocol choices, and performance testing on multi‑GPU clusters.

AI distributed trainingGPU communicationHigh‑performance computing
0 likes · 71 min read
Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 10, 2024 · Artificial Intelligence

GPU Memory Analysis and Distributed Training Strategies

This article explains how GPU memory is allocated during model fine‑tuning, describes collective communication primitives, and compares data parallel, model parallel, ZeRO, pipeline parallel, mixed‑precision, and checkpointing techniques for reducing memory consumption in large‑scale AI training.

Distributed TrainingGPU MemoryPipeline Parallel
0 likes · 9 min read
GPU Memory Analysis and Distributed Training Strategies