Operations 11 min read

Why Kubernetes CPU Metrics Differ from Traditional VMs: A Deep Dive

This article compares CPU, memory, disk, and network monitoring metrics between traditional KVM servers and Kubernetes pods, explaining the underlying reasons for differences and offering guidance on interpreting the metrics for effective performance troubleshooting.

MaGe Linux Operations

Apr 22, 2020

Why Kubernetes CPU Metrics Differ from Traditional VMs: A Deep Dive

Applying Kubernetes has been underway for a while, and the monitoring system now provides Kubernetes‑Pod related metrics and alert rules.

Because Kubernetes and traditional physical/virtual machines run in completely different environments, the monitoring metrics differ. Although the platform tries to unify these differences, users still report feedback on Kubernetes metrics.

This article explains the differences between physical/virtual machines (referred to as KVM) and Kubernetes from four perspectives—CPU, memory, disk, and network—to help users understand the underlying principles when using the monitoring product.

All data collected by the monitoring platform come from native Kubernetes metrics, so they are inevitably limited by the characteristics of Kubernetes native interfaces.

Overall, the differences between Kubernetes and KVM lie in various components:

CPU differences are the most significant, dictated by Kubernetes' technical nature.

Memory has some differences but can largely be unified with the KVM stack.

Network and disk differences are minimal, requiring little extra understanding.

CPU

In KVM scenarios, users focus on CPU usage rate and CPU load:

High CPU load with low usage usually indicates a bottleneck in disk I/O.

High CPU usage with load far exceeding the number of cores shows severe CPU resource shortage.

In Kubernetes, users should watch CPU usage rate and CPU throttling time:

When CPU usage approaches or slightly exceeds 100% and throttling time is high, the pod lacks sufficient CPU resources and needs higher request or limit values.

The reasons for these differences are:

Kubernetes and KVM use different CPU isolation mechanisms.

Linux metric exposure differs from Kubernetes metric exposure.

The monitoring system provides two related metrics:

The diagram below shows a throttled application where CPU usage exceeds 100%.

CPU Usage

For an independent CPU core, time is divided into three parts: user code execution, kernel code execution, and idle time (HLT instruction on x86).

In KVM, CPU usage is calculated directly as (user time + kernel time) / total time.

In Kubernetes, a pod does not own a dedicated core, so the formula changes. A pod with a CPU limit of 4 can use up to 4 seconds of CPU per second, and a pod using 0.5 seconds per second is considered to be using 50% of a core.

Kubernetes does not natively expose a “usage rate” concept, but the monitoring system derives pod CPU usage as usage / limit.

Because of limited granularity in CPU limit enforcement and measurement error, CPU usage can spike above 100% under extreme load.

CPU Load

CPU load measures the number of runnable threads in the system. It includes threads in running state and those in uninterruptible sleep (typically I/O). When CPU usage is low but load is high, the bottleneck is likely disk or network I/O.

Kubernetes provides a cpu_load metric that only counts running threads, losing the ability to detect I/O‑bound bottlenecks, and the metric is disabled by default.

Kubernetes also offers a “CPU throttling time” metric, which captures the time a pod is limited by the CFS scheduler. When throttling time is high, the pod’s CPU resources are insufficient.

Memory

Both KVM and Kubernetes use a memory usage metric, but they differ in what counts as used memory.

In KVM, total‑available is used because cache/buffer/slab memory impact varies by application.

Kubernetes lacks an available metric; it primarily uses RSS as used memory.

The monitoring system provides several memory metrics, illustrated below:

Linux’s free command reports used, cache/buffer, and available columns. The monitoring system currently calculates used memory as total‑available and derives a usage rate from that.

Kubernetes exposes three memory values:

MemUsed – similar to Linux’s used, includes cache.

WorkingSet – excludes cold cache data.

RSS – excludes cache entirely.

In practice, WorkingSet often appears high (around 90% for typical web apps), so the monitoring system prefers RSS for memory usage, while recommending users consider other metrics when diagnosing performance issues.

Disk / Network

Based on Linux cgroup isolation, disk and network metrics differ little between Kubernetes and KVM. Web applications usually care about disk usage, but Kubernetes clusters are typically disk‑less unless persistent volumes are used, so disk space is not a concern.

For disk, the focus is on write performance metrics, shown below:

Network monitoring is similar to KVM, focusing on traffic and packet loss:

Source: https://tech.kujiale.com/jian-kong-pod-shi-wo-men-zai-jian-kong-shi-yao/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Kubernetes CPU

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.