Understanding Kubernetes WorkingSet: Metrics, Scripts, and Memory QoS Solutions
This article explains the WorkingSet metric in Kubernetes, shows how to calculate it with cgroup v1 and v2 scripts, outlines common container memory issues such as the memory black‑hole, and presents troubleshooting steps using SysOM monitoring and Koordinator QoS to resolve high WorkingSet usage.
WorkingSet Concept in Kubernetes
In Kubernetes, the real‑time memory usage of a pod (Pod Memory) is represented by the WorkingSet (WSS) metric defined by cAdvisor. WorkingSet is also used by the scheduler for eviction decisions.
Official Definition
Reference: Kubernetes eviction signals
Calculate WorkingSet
The following scripts can be run on a node to compute WorkingSet for cgroup v1 and v2.
CGroupV1
#!/usr/bin/env bash
# This script reproduces what the kubelet does to calculate memory.available relative to root cgroup.
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')
memory_working_set=${memory_usage_in_bytes}
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ]; then
memory_working_set=0
else
memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))
echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"CGroupV2
#!/bin/bash
# This script reproduces what the kubelet does to calculate memory.available relative to kubepods cgroup.
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/kubepods.slice/memory.current)
memory_total_inactive_file=$(cat /sys/fs/cgroup/kubepods.slice/memory.stat | grep inactive_file | awk '{print $2}')
memory_working_set=${memory_usage_in_bytes}
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ]; then
memory_working_set=0
else
memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))
echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"On a node, WorkingSet equals the root cgroup memory usage minus the inactive file cache. The same logic applies to a pod’s container.
Common User Issues
Host memory usage appears lower than aggregated pod memory (host ~40%, pods ~90%) because pod WorkingSet includes page cache and other caches.
Running top inside a pod shows smaller values than kubectl top pod because top reads host metrics, not container‑isolated ones.
“Memory black hole” where hidden caches (e.g., PageCache, Dirty Memory) cause WorkingSet spikes.
Diagnosing with SysOM
SysOM (System Observer Monitoring) provides kernel‑level container metrics, showing detailed pod memory composition such as Cache, InactiveFile, InactiveAnon, and Dirty Memory.
Resolving High WorkingSet
Typical solutions include scaling resources, clearing page cache, and using Koordinator QoS for fine‑grained memory scheduling. Koordinator can set memory high‑watermarks, lock‑step reclamation, and differential guarantees for BestEffort pods.
Step 1: Observe
Use SysOM’s Pod Memory Monitor to locate the memory component causing the increase.
Step 2: Optimize
For deep‑rooted consumption like PageCache, consider code changes (e.g., flushing Log4j/Logback appender) or rely on Koordinator’s background reclamation.
References: cAdvisor source code, ACK SysOM documentation, Koordinator memory QoS guide.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
