Cloud Native 11 min read

Why CPU Limits Can Slow Down Your Kubernetes Pods – A Deep Dive into cgroups v2 and CFS

This article explains how Kubernetes CPU requests and limits interact with Linux namespaces, cgroups v2, and the Completely Fair Scheduler, showing why limits can cause throttling, increase latency, and mislead monitoring, and offers guidance on when to use or avoid CPU limits.

DevOps Coach
DevOps Coach
DevOps Coach
Why CPU Limits Can Slow Down Your Kubernetes Pods – A Deep Dive into cgroups v2 and CFS

Background

In Kubernetes, configuring CPU requests and limits is a routine task, but the underlying interaction between Kubernetes, the Linux kernel, and container runtimes (docker, containerd, etc.) can significantly affect application performance, especially under high load.

Core Concepts

Pods run as processes on the host Linux node and rely on two kernel mechanisms for isolation and resource control: namespaces and cgroups .

CPU Requests : Used by the Kubernetes scheduler to decide pod placement and provide a minimum guaranteed CPU share.

CPU Limits : Enforced by the kernel’s Completely Fair Scheduler (CFS) via a bandwidth‑control mechanism. By default a container can use at most the allotted CPU time within a 100 ms quota window (CFS period).

If a container exhausts its quota within the period, it is throttled and must wait for the next period to continue.

Illustrative Example

A container with a CPU limit of 0.4 core attempts a 200 ms CPU‑intensive task. Because the limit allows only 40 ms per 100 ms period, the task is throttled four times, taking 440 ms instead of 200 ms (2.2× slower).

Consequences of Throttling

Liveness probes may fail.

JVM or .NET garbage‑collection pauses can occur, potentially leading to OOM.

Heartbeat events may be lost.

Work queues can back up.

Monitoring dashboards often still show low average CPU usage, making the root cause hard to pinpoint.

Linux Perspective: CFS and cgroups v2

The kernel’s Completely Fair Scheduler (CFS) allocates CPU time. When Kubernetes schedules a container:

CPU requests are translated to a weight (cgroups v2: cpu.weight or cpu.weight.nice).

CPU limits, if set, are enforced via cpu.max, which relies on CFS bandwidth control.

cgroups v2 enables finer‑grained resource control but introduces subtle issues for multithreaded or bursty workloads.

To locate a pod’s cgroup configuration on the host, run cat /proc/<pid>/cgroup, strip the leading 0::/, and prepend /sys/fs/cgroup/.

Example: Multithreaded Workload Under Low CPU Limit

Assume 10 CPU‑intensive threads each need 50 ms of CPU time. With a container limit of 2 cores, the total quota per 100 ms period is 200 ms. All threads run for the first 20 ms, consuming 200 ms, then are throttled for the remaining 80 ms, extending the overall task time to 210 ms (instead of 50 ms) and reducing effective CPU utilization by over 75%.

Why Throttling Can Appear Even When Usage Is Low

Common scenarios:

Periodic spikes (e.g., every 10–20 s) that exceed the 100 ms CFS period.

Multithreaded bursty workloads such as API gateways or garbage collectors.

Monitoring windows that are too long, smoothing out short‑term throttling.

In such cases, a pod may spend 25–50 % of its time throttled while the dashboard shows <10 % CPU usage.

Community Opinions on Using CPU Limits

Tim Hockin (Kubernetes maintainer): Generally avoid CPU limits; rely on requests and autoscaling.

Various industry sources (Grafana, Buffer, NetData, SlimStack) recommend removing limits for critical workloads.

When to Set CPU Limits

Appropriate scenarios:

Pre‑production environments for regression and performance testing.

Multi‑tenant clusters with strict ResourceQuota.

When a Guaranteed QoS class is required for eviction protection or CPU pinning.

Should be avoided or used very loosely for:

Latency‑sensitive applications (API gateways, GC‑heavy runtimes).

Workloads with bursty or highly concurrent multithreading.

Environments where monitoring cannot capture short‑term throttling.

Observability: Metrics Beyond Default Dashboards

Key metrics to detect throttling: container_cpu_cfs_throttled_periods_total – counts throttling periods (frequency). container_cpu_cfs_throttled_seconds_total – total throttled time (severity).

Configure Grafana panels with a 100 ms resolution to align with the CFS period.

Additional tools:

KEDA – event‑driven autoscaling.

VPA & HPA – resource optimization and auto‑scaling.

Karpenter (AWS) – dynamic node provisioning.

Conclusion

Kubernetes provides flexible CPU resource management, but misusing limits can dramatically degrade performance while metrics misleadingly indicate idle containers. Treat CPU limits as a safety valve, not a default setting; apply them only after thorough testing under realistic traffic and load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesResource ManagementCFScpu-limitscgroups v2
DevOps Coach
Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.