Why Do Your VMs Get OOM‑Killed? Uncovering Hidden Memory Overhead in KVM
This article investigates why virtual machines on an OpenStack IaaS platform experience OOM‑killer terminations despite reserved host memory, analyzes memory usage patterns of qemu‑kvm processes, and proposes practical solutions to mitigate unexpected OOM events.
Introduction
The author, responsible for virtualization and container services on the 360 HULK cloud platform, explores the OOM Killer—a kernel feature that terminates a process when host memory is insufficient.
Problem Description
Scenario: OpenStack IaaS management platform, compute node running CentOS 7.2, QEMU, KVM with 128 GB RAM.
1 Problem Identification
VMs unexpectedly crash due to OOM despite no memory over‑commit and a 12 GB OS reservation (9.375% of total). Theoretical maximum memory usage should stay below 90.625%, yet OOM occurs.
2 Problem Investigation
Observations show OS services use far less than the reserved 12 GB. After restarting affected VMs, host memory usage remains around 4 GB, giving a theoretical usage of (128‑12+4)/128 ≈ 93.75%, still below OOM threshold. Thus, OS memory is not the cause.
Further analysis of qemu‑kvm process memory reveals a significant “RES” value exceeding the VM’s allocated memory. For a 4‑core 8 GB VM, actual usage is 8.3‑8.9 GB; for a 2‑core 4 GB VM, usage is 4.6‑4.8 GB. The excess memory is accounted for by the hypervisor process.
Research indicates that qemu‑kvm also allocates memory for virtual devices, which is counted toward the VM’s process memory.
3 Solution
Increase the OS reserved memory space to absorb the extra memory used by the VM.
Raise the swap size (currently 4 GB) on nodes with SSDs to provide a larger buffer during OOM situations.
Adjust OpenStack scheduling logic to reserve additional memory for VMs, though this is less universally applicable.
References
https://lime-technology.com/forums/topic/48093-kvm-memory-leakingoverhead/
https://unix.stackexchange.com/questions/140322/kvm-killed-by-oomkiller
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.