Tagged articles
2 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 28, 2025 · Artificial Intelligence

How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference

This article breaks down the GPU memory requirements of large language models during training and inference, detailing the contributions of model weights, optimizer states, activations, KV cache, and activation recomputation, and provides concrete formulas, examples, and scaling insights for models like Qwen3 and DeepSeek V3.

GPU MemoryKV cacheLLM
0 likes · 18 min read
How Much GPU Memory Do LLMs Really Need? A Deep Dive into Training & Inference
Kuaishou Large Model
Kuaishou Large Model
Nov 22, 2024 · Artificial Intelligence

Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism

This article details a comprehensive set of techniques—including data‑ and tensor‑parallel overlap, context‑parallelism, activation rematerialization, and a performance‑driven cost model—that dramatically improve large‑language‑model training efficiency on ultra‑large GPU clusters while preserving model quality.

Distributed TrainingParallelismPerformance Modeling
0 likes · 28 min read
Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism