Cloud Native 9 min read

How to Detect and Prevent OOM and CPU Throttling in Kubernetes

This article explains why memory OOM and CPU throttling are critical issues in Kubernetes, shows how limits and requests work, demonstrates monitoring techniques with Prometheus and cAdvisor, and provides practical best‑practice recommendations to avoid pod eviction and performance degradation.

ITPUB

Jun 22, 2024

How to Detect and Prevent OOM and CPU Throttling in Kubernetes

Introduction

In Kubernetes, insufficient memory (OOM) and CPU throttling are the two most critical resource problems, especially for latency‑sensitive applications. Mis‑configured limits can cause Redis clusters to fail or other services to become unstable, and they directly affect cloud costs.

Kubernetes OOM

Each container in a pod needs memory to run. When a container exceeds its memory limits, the Linux OOM killer terminates the process, which appears as exit code 137 (OOMKilled). Kubernetes records this in the pod status and uses the oom_score_adj value to prioritize which pods to kill.

Three possible sources of memory limits apply:

Kubernetes limits set on the container.

Kubernetes ResourceQuota set on the namespace.

The actual physical memory of the node.

Memory Overcommitment

When limits are higher than requests, the total limits can exceed node capacity. This overcommitment is common; if containers collectively use more memory than requested, the node may run out of memory and evict pods to free space.

Monitoring OOM

In the Prometheus ecosystem, the node_vmstat_oom_kill metric from node‑exporter indicates when an OOM kill occurs. To anticipate OOM events, you can compare current memory usage against the defined limits:

(sum by (namespace,pod,container) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container) (kube_pod_container_resource_limits{resource="cpu"})) > 0.8

Kubernetes CPU Throttling

CPU throttling slows a process when it approaches its resource limits. The same three sources apply as for memory: container limits, namespace ResourceQuota, and the node's actual CPU capacity.

Think of CPU as a highway: processes are cars of varying sizes, multiple lanes are CPU cores, and requests are dedicated lanes (e.g., bike lanes). When demand exceeds capacity, traffic congestion (throttling) occurs, slowing all processes without killing them.

CPU Shares in Kubernetes

Kubernetes allocates CPU using shares. Each CPU core is divided into 1024 shares, and the Linux CFS scheduler distributes CPU time proportionally to the shares a pod holds. If a pod uses more than 100% of a core, the scheduler enforces throttling; unlike memory, throttling does not kill the pod.

You can check CPU stats in /sys/fs/cgroup/cpu/cpu.stat

Monitoring CPU Throttling

Prometheus provides two cAdvisor metrics: container_cpu_cfs_throttled_periods_total and container_cpu_cfs_periods_total. Their ratio gives the throttling percentage for each container.

(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Best Practices

Mind Limits and Requests

Set realistic limits to avoid unexpected throttling or OOM kills. Overly low requests can cause the kubelet to evict pods that exceed their requested resources.

Prepare for Eviction

Pods that use more than their requests are first candidates for eviction. Use PriorityClass to protect critical workloads.

Throttling Is a Silent Enemy

Unrealistic limits may silently degrade performance. Continuously monitor CPU usage at the container and namespace level to detect when a pod is approaching its limits.

Conclusion

Properly configuring limits and requests, understanding overcommitment, and actively monitoring OOM and CPU throttling metrics are essential to maintain stability and cost efficiency in Kubernetes clusters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Kubernetes resource-limits CPU throttling

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.