Why Does My Kubernetes Pod Get OOMKilled Before Hitting Memory Limits?
This article explains why a Kubernetes pod can be OOMKilled and repeatedly restarted even before reaching its memory limit, detailing the role of QoS classes, oom_score calculations, and node‑memory pressure in the eviction process, and offers practical mitigation steps.
Problem Description
In the afternoon the operations team reported that a pod restarted eight times in a day. The JVM logs showed no errors, so the process was being killed directly. The initial guess was an OOMKill because the pod hit its memory limit.
Both Xmx and Xms were set to 3 GiB, while the pod’s memory limit was 6 GiB, making the situation puzzling.
Investigation Process
Initial定位
The pod description retrieved from the operations team is shown below.
Containers:
container-prod--:
Container ID: --
Image: --
Image ID: docker-pullable://--
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 05 Jan 2024 11:40:01 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 05 Jan 2024 11:27:38 +0800
Finished: Fri, 05 Jan 2024 11:39:58 +0800
Ready: True
Restart Count: 8
Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512MiThe "Last State: Terminated" with "Exit Code: 137" indicates the pod was killed by SIGKILL, typically because it exceeded its memory limit and was OOMKilled. The immediate remedy was to increase the pod’s memory limit to keep the service stable, then investigate why the pod consumed so much off‑heap memory.
Further Analysis
The operations team said the node’s memory was already at 99 %, so they could not increase the pod’s limit. Monitoring showed the pod’s memory usage was only around 4 GiB before being killed, far below the 6 GiB limit.
This raised the question: why was the pod killed before reaching its limit?
If a pod is killed for exceeding its limit, the Reason field is "OOMKilled" rather than "Error".
When the node’s memory is exhausted, Kubernetes evicts pods to free resources.
Why was only this pod repeatedly evicted while other services were unaffected?
Puzzle Solved
Google search led to an article titled “Why my pod gets OOMKill (exit code 137) without reaching threshold of requested memory”. The author experienced the same situation.
Last State: Terminated
Reason: Error
Exit Code: 137The root cause was Kubernetes’ QoS mechanism: when the node runs out of memory, pods are killed according to QoS priority.
What is k8s QoS?
QoS (Quality of Service) classifies pods based on their resource requests and limits, influencing eviction decisions when a node is under memory pressure.
QoS
Condition
Guaranteed
All containers set identical cpu and memory request and limit.
Burstable
Pod does not meet Guaranteed conditions and at least one container has a request or limit.
BestEffort
No container sets any request or limit.
When a node’s resources are exhausted, eviction follows the order BestEffort → Burstable → Guaranteed.
From the pod description we see the limits are 8 cpu / 6 Gi memory and requests are 100 m cpu / 512 Mi memory, which matches the Burstable class.
Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512MiTherefore, under node‑memory pressure, a Burstable pod is more likely to be evicted than a Guaranteed one, even if it has not reached its limit.
Eviction priority when QoS is the same
All pods in the cluster were Burstable, so the next factor is the oom_score calculated for each container.
If the kubelet can't reclaim memory before a node experiences OOM, the oom_killer calculates an oom_score based on the percentage of memory the container uses on the node, adds the oom_score_adj , and kills the container with the highest score. This means containers in low QoS pods that consume a large amount of memory relative to their scheduling requests are killed first.
The oom_score is the sum of the pod’s memory‑usage‑percentage and its oom_score_adj, which is derived from the QoS class:
QoS
oom_score_adj
Guaranteed
-997
BestEffort
1000
Burstable
min(max(2, 1000 - (1000 × memoryRequestBytes) / machineMemoryCapacityBytes), 999)
Consequently, among Burstable pods the one with the highest memory‑usage‑to‑request ratio gets the highest oom_score and is evicted first.
Summary
The pod was repeatedly restarted because the node’s memory was fully consumed, triggering a node‑pressure eviction. Kubernetes selected the pod with the highest oom_score—this Burstable pod with a 512 Mi request but higher actual memory usage.
Expand the node’s memory to avoid eviction due to resource exhaustion.
For critical services, set request and limit to identical values so the pod’s QoS becomes Guaranteed, reducing the chance of being killed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
