How to Detect and Prevent OOM and CPU Throttling in Kubernetes
This article explains why memory OOM and CPU throttling are critical issues in Kubernetes, shows how limits and requests work, demonstrates monitoring techniques with Prometheus and cAdvisor, and provides practical best‑practice recommendations to avoid pod eviction and performance degradation.
Introduction
In Kubernetes, insufficient memory (OOM) and CPU throttling are the two most critical resource problems, especially for latency‑sensitive applications. Mis‑configured limits can cause Redis clusters to fail or other services to become unstable, and they directly affect cloud costs.
Kubernetes OOM
Each container in a pod needs memory to run. When a container exceeds its memory limits, the Linux OOM killer terminates the process, which appears as exit code 137 (OOMKilled). Kubernetes records this in the pod status and uses the oom_score_adj value to prioritize which pods to kill.
Three possible sources of memory limits apply:
Kubernetes limits set on the container.
Kubernetes ResourceQuota set on the namespace.
The actual physical memory of the node.
Memory Overcommitment
When limits are higher than requests, the total limits can exceed node capacity. This overcommitment is common; if containers collectively use more memory than requested, the node may run out of memory and evict pods to free space.
Monitoring OOM
In the Prometheus ecosystem, the node_vmstat_oom_kill metric from node‑exporter indicates when an OOM kill occurs. To anticipate OOM events, you can compare current memory usage against the defined limits:
(sum by (namespace,pod,container) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container) (kube_pod_container_resource_limits{resource="cpu"})) > 0.8Kubernetes CPU Throttling
CPU throttling slows a process when it approaches its resource limits. The same three sources apply as for memory: container limits, namespace ResourceQuota, and the node's actual CPU capacity.
Think of CPU as a highway: processes are cars of varying sizes, multiple lanes are CPU cores, and requests are dedicated lanes (e.g., bike lanes). When demand exceeds capacity, traffic congestion (throttling) occurs, slowing all processes without killing them.
CPU Shares in Kubernetes
Kubernetes allocates CPU using shares. Each CPU core is divided into 1024 shares, and the Linux CFS scheduler distributes CPU time proportionally to the shares a pod holds. If a pod uses more than 100% of a core, the scheduler enforces throttling; unlike memory, throttling does not kill the pod.
You can check CPU stats in /sys/fs/cgroup/cpu/cpu.stat
Monitoring CPU Throttling
Prometheus provides two cAdvisor metrics: container_cpu_cfs_throttled_periods_total and container_cpu_cfs_periods_total. Their ratio gives the throttling percentage for each container.
(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))Best Practices
Mind Limits and Requests
Set realistic limits to avoid unexpected throttling or OOM kills. Overly low requests can cause the kubelet to evict pods that exceed their requested resources.
Prepare for Eviction
Pods that use more than their requests are first candidates for eviction. Use PriorityClass to protect critical workloads.
Throttling Is a Silent Enemy
Unrealistic limits may silently degrade performance. Continuously monitor CPU usage at the container and namespace level to detect when a pod is approaching its limits.
Conclusion
Properly configuring limits and requests, understanding overcommitment, and actively monitoring OOM and CPU throttling metrics are essential to maintain stability and cost efficiency in Kubernetes clusters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
