Why Does My Kubernetes Pod Get OOMKilled Before Reaching Its Memory Limit?
A pod in a Kubernetes cluster repeatedly restarted with exit code 137 despite staying well below its 6 Gi memory limit, prompting an investigation that uncovered the role of QoS classes, oom_score calculations, and node‑level memory pressure in the eviction process.
Problem Description
During a single day a pod restarted eight times. The pod’s Exit Code was 137 and the Reason field showed Error . The JVM logs contained no errors, yet the process was killed. The pod had a memory limit of 6 Gi (JVM -Xmx/-Xms set to 3 Gi) and the node’s memory usage was already at 99 %.
Investigation Process
Initial Findings
Containers:
container-prod--:
Container ID: --
Image: --
Port: 8080/TCP
State: Running
Started: Fri, 05 Jan 2024 11:40:01 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Restart Count: 8
Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512MiExit Code 137 means the process received SIGKILL, typically caused by an OOM kill.
The immediate hypothesis was that the pod exceeded its memory limit.
Further Analysis
The node could not accept a larger pod memory limit because its own memory was almost exhausted. Monitoring showed the pod’s memory usage peaked at ~4 Gi, well below the 6 Gi limit, yet the pod was still terminated.
Key observations:
If a pod is killed because it exceeds its limit, the Reason field is OOMKilled , not Error .
When a node runs out of memory, Kubernetes evicts pods based on Quality‑of‑Service (QoS) priority.
Only this pod was repeatedly evicted while other services remained running.
Root‑Cause Discovery
The issue was traced to Kubernetes’ QoS classification and the oom_score calculation used by the kubelet’s OOM killer.
Understanding Kubernetes QoS
Kubernetes assigns each pod to one of three QoS classes based on the relationship between requests and limits:
Guaranteed : Every container sets identical CPU and memory request and limit values.
Burstable : At least one container defines a request or limit, but the pod does not satisfy the Guaranteed criteria.
BestEffort : No container defines any request or limit.
When a node is under memory pressure, eviction proceeds in the order BestEffort → Burstable → Guaranteed . Within the same QoS class, pods are ordered by their effective oom_score:
oom_score = (memory_used / node_memory) * 1000 + oom_score_adjThe adjustment value oom_score_adj is derived from the pod’s QoS:
Guaranteed: -997
BestEffort: 1000
Burstable: min(max(2, 1000 - (1000 × memoryRequestBytes) / machineMemoryCapacityBytes), 999)
Applying QoS to the Problem Pod
The pod specification was:
Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512MiBecause the memory request (512 Mi) is far lower than the limit (6 Gi), the pod belongs to the Burstable class. All other services on the node were configured as Guaranteed , giving them a much lower oom_score_adj. Consequently, when the node’s memory reached 99 % usage, the kubelet selected the Burstable pod with the highest oom_score for eviction, even though its actual usage (≈4 Gi) was below its limit.
Eviction Priority Among Burstable Pods
In this cluster every Burstable pod used the same memory request (512 Mi). The kubelet therefore compared the ratio
memory_used / memory_request</strong>. The pod that consumed the largest proportion of memory (the problematic service) obtained the highest <code>oom_scoreand was evicted repeatedly.
Resolution
Increase the node’s physical memory or reduce overall node memory pressure.
For critical workloads, set requests and limits to identical values so the pod becomes Guaranteed , which assigns an oom_score_adj of -997 and dramatically lowers eviction likelihood.
Key Takeaways
Kubernetes may kill a pod before it reaches its declared memory limit when the node is under memory pressure and the pod’s QoS class is lower than that of competing pods. Understanding QoS classification, the oom_score formula, and the impact of oom_score_adj is essential for designing resilient pod resource specifications and avoiding unexpected restarts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
