Tagged articles
2 articles
Page 1 of 1
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 11, 2025 · Artificial Intelligence

Fine‑Grained Activation Offloading: Cutting Memory Use While Preserving LLM Throughput

The article introduces a fine‑grained activation offloading technique implemented in Megatron‑Core that offloads module‑level activations to CPU, overlaps transfer with computation, and remains compatible with pipeline and virtual pipeline parallelism, dramatically reducing peak GPU memory for large language models while incurring minimal throughput loss.

LLMMegatronMemory Optimization
0 likes · 18 min read
Fine‑Grained Activation Offloading: Cutting Memory Use While Preserving LLM Throughput
Kuaishou Large Model
Kuaishou Large Model
Jul 11, 2024 · Artificial Intelligence

Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training

Researchers from Kwai’s large-model team present a novel training system that combines pipeline-parallel-aware activation offloading with a compute-memory balanced checkpointing strategy, enabling lossless acceleration of large language models, achieving up to 42.7% MFU on 256 NVIDIA H800 GPUs while reducing memory usage.

GPU trainingKwaiPerformance Modeling
0 likes · 13 min read
Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training