Cloud Native 11 min read

Understanding Kubernetes WorkingSet: Metrics, Scripts, and Memory QoS Solutions

This article explains the WorkingSet metric in Kubernetes, shows how to calculate it with cgroup v1 and v2 scripts, outlines common container memory issues such as the memory black‑hole, and presents troubleshooting steps using SysOM monitoring and Koordinator QoS to resolve high WorkingSet usage.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
Understanding Kubernetes WorkingSet: Metrics, Scripts, and Memory QoS Solutions

WorkingSet Concept in Kubernetes

In Kubernetes, the real‑time memory usage of a pod (Pod Memory) is represented by the WorkingSet (WSS) metric defined by cAdvisor. WorkingSet is also used by the scheduler for eviction decisions.

Official Definition

Reference: Kubernetes eviction signals

Calculate WorkingSet

The following scripts can be run on a node to compute WorkingSet for cgroup v1 and v2.

CGroupV1

#!/usr/bin/env bash
# This script reproduces what the kubelet does to calculate memory.available relative to root cgroup.
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')
memory_working_set=${memory_usage_in_bytes}
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ]; then
    memory_working_set=0
else
    memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))
echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"

CGroupV2

#!/bin/bash
# This script reproduces what the kubelet does to calculate memory.available relative to kubepods cgroup.
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/kubepods.slice/memory.current)
memory_total_inactive_file=$(cat /sys/fs/cgroup/kubepods.slice/memory.stat | grep inactive_file | awk '{print $2}')
memory_working_set=${memory_usage_in_bytes}
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ]; then
    memory_working_set=0
else
    memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))
echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"

On a node, WorkingSet equals the root cgroup memory usage minus the inactive file cache. The same logic applies to a pod’s container.

Common User Issues

Host memory usage appears lower than aggregated pod memory (host ~40%, pods ~90%) because pod WorkingSet includes page cache and other caches.

Running top inside a pod shows smaller values than kubectl top pod because top reads host metrics, not container‑isolated ones.

“Memory black hole” where hidden caches (e.g., PageCache, Dirty Memory) cause WorkingSet spikes.

Diagnosing with SysOM

SysOM (System Observer Monitoring) provides kernel‑level container metrics, showing detailed pod memory composition such as Cache, InactiveFile, InactiveAnon, and Dirty Memory.

Resolving High WorkingSet

Typical solutions include scaling resources, clearing page cache, and using Koordinator QoS for fine‑grained memory scheduling. Koordinator can set memory high‑watermarks, lock‑step reclamation, and differential guarantees for BestEffort pods.

Step 1: Observe

Use SysOM’s Pod Memory Monitor to locate the memory component causing the increase.

Step 2: Optimize

For deep‑rooted consumption like PageCache, consider code changes (e.g., flushing Log4j/Logback appender) or rely on Koordinator’s background reclamation.

References: cAdvisor source code, ACK SysOM documentation, Koordinator memory QoS guide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeMemory ManagementKubernetesQoScontainer monitoringWorkingSet
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.