Tagged articles
2 articles
Page 1 of 1
Fun with Large Models
Fun with Large Models
Aug 30, 2025 · Artificial Intelligence

How to Fine‑Tune Large Models on Multiple Nodes and GPUs – A Must‑Know Interview Answer

This article explains how to fine‑tune large models across multiple machines and GPUs by covering data, model, tensor, and pipeline parallelism, hybrid 3D parallel strategies, engineering details such as NCCL, PyTorch Distributed, DeepSpeed, fault‑tolerance, checkpointing, and the ZeRO optimizer stages that dramatically reduce memory usage.

Data ParallelDeepSpeedDistributed Training
0 likes · 8 min read
How to Fine‑Tune Large Models on Multiple Nodes and GPUs – A Must‑Know Interview Answer
DataFunTalk
DataFunTalk
Dec 6, 2023 · Artificial Intelligence

Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)

This article presents a comprehensive overview of state‑of‑the‑art distributed training methods for large language models, using GPT‑175B as a case study to analyze memory, communication, and compute overheads, and to recommend practical optimization strategies such as tensor, pipeline, and sequence parallelism, ZeRO‑1 optimizer, and selective activation checkpointing.

Distributed TrainingGPU memory optimizationLLM
0 likes · 22 min read
Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)