Tagged articles
3 articles
Page 1 of 1
Infra Learning Club
Infra Learning Club
Nov 1, 2024 · Artificial Intelligence

Configuring vLLM swap_space and cpu_offload_gb for Stable Large-Model Inference

The article explains vLLM’s GPU compute capability requirement, describes the swap_space and cpu_offload_gb parameters, outlines their ideal usage scenarios, and provides step‑by‑step code examples that demonstrate how adjusting these settings enables loading and running a 7B‑parameter model on a 16 GB T4 GPU.

GPU Memory Managementcpu_offload_gblarge language model inference
0 likes · 9 min read
Configuring vLLM swap_space and cpu_offload_gb for Stable Large-Model Inference
Liangxu Linux
Liangxu Linux
Nov 29, 2020 · Fundamentals

Why Does Linux Need Swapping? Understanding Memory Pressure and Idle Page Management

Linux uses swapping to move rarely used memory pages to disk, alleviating memory pressure and reclaiming idle memory, but the performance impact of disk I/O can cause latency; this article explains the mechanisms, triggers, and kernel functions behind swapping, including direct reclaim, kswapd, and LRU lists.

LinuxOperating SystemSwapping
0 likes · 11 min read
Why Does Linux Need Swapping? Understanding Memory Pressure and Idle Page Management