Cloud Native 12 min read

Why Kubernetes OOM Kills Use WSS, Not RSS – Diagnose & Fix Container Memory

After moving IoT services to Kubernetes, containers were OOM‑killed despite RSS staying below limits because Kubernetes bases OOM decisions on the Working Set Size (WSS) metric, which includes file cache, and the article explains its calculation, reproduces the issue, and offers practical mitigation strategies.

G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Why Kubernetes OOM Kills Use WSS, Not RSS – Diagnose & Fix Container Memory

Background and Failure Phenomenon

After migrating IoT business services to Kubernetes, containers were frequently OOM‑killed. Although the container's RSS and the monitored RSS metric stayed within the memory limit, the monitored metric wss (container_memory_working_set_bytes) often exceeded 95 %.

We discovered that Kubernetes uses wss , not RSS, as the basis for OOM kills.

How WSS Is Calculated

Kubernetes obtains memory metrics from the cAdvisor component, which reads two files in the container’s cgroup filesystem. The relevant source code shows the relationship:

Further source excerpts illustrate that container_memory_working_set_bytes = rss + cache - total_inactive_file. In other words, it includes the file‑system cache.

Thus:

container_memory_rss = usual RSS.

container_memory_usage_bytes = rss + cache.

container_memory_working_set_bytes = rss + cache – total_inactive_file.

Reproducing the Issue

Prepare a program (m.c) that can precisely control memory usage.

Prepare a script (s.sh) that continuously writes a file of a specified size.

Run both inside a severely memory‑constrained container.

Before starting the write script, docker stats shows 96 MB (76 %) memory usage. After the script starts writing a 1 GB file, memory quickly climbs to 99 % and the m program is killed.

Kernel logs confirm an OOM kill triggered by the container’s memory cgroup.

Solutions

Clear Logs

Emptying large log files reduces the cache component of WSS, which was the earliest production workaround.

Drop All Cache

Writing “3” to /proc/sys/vm/drop_caches frees all page cache, but the /proc filesystem is read‑only inside containers, so the operation must be performed on the host and affects all containers on the node.

Drop Cache for Specific Files

The vmtouch tool (compiled from source) can evict cache for selected files inside a container.

After mounting the compiled binary into the container, running vmtouch -e lowers WSS, though it adds complexity and can consume ~10 % CPU.

Adjust Kernel Parameters

Increasing vm.vfs_cache_pressure and vm.min_free_kbytes accelerates cache reclamation, effectively reducing WSS.

After tuning, the same write test no longer triggers OOM; memory usage stays around 88 %.

These approaches demonstrate how file‑system cache contributes to WSS‑based OOM kills and provide practical mitigation techniques for Kubernetes clusters.

KubernetesOOMCache ManagementKernel ParameterscAdvisorContainer Memorywss
G7 EasyFlow Tech Circle
Written by

G7 EasyFlow Tech Circle

Official G7 EasyFlow tech channel! All the hardcore tech, cutting‑edge innovations, and practical sharing you want are right here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.