Why Is My Kubernetes Pod OOMKilled Before Reaching Its Memory Limit?
A Kubernetes pod repeatedly restarted with exit code 137 despite not hitting its memory limit, revealing that node‑level memory pressure and QoS‑based eviction caused the pod to be killed, and outlining how to diagnose and prevent such OOMKill events.
Problem Description
One afternoon the operations team reported that a pod restarted eight times in a day. Kibana logs showed no JVM errors; the process was simply killed. The initial hypothesis was that the pod hit its memory limit and was OOM‑killed by Kubernetes.
<code>Containers:
container-prod--:
Container ID: --
Image: --
Image ID: docker-pullable://--
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 05 Jan 2024 11:40:01 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 05 Jan 2024 11:27:38 +0800
Finished: Fri, 05 Jan 2024 11:39:58 +0800
Ready: True
Restart Count: 8
Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512Mi
</code>The Last State shows Exit Code: 137 , which indicates the pod process was killed by
SIGKILL. Usually this happens when the pod exceeds its memory limit.
The immediate fix was to increase the pod’s memory limit in production, then investigate why the pod consumed so much off‑heap memory.
Further Analysis
The operations team later reported that the node’s memory was already at 99%, so the pod’s memory limit could not be increased.
Monitoring showed the pod’s memory usage was only about 4 GiB before it was killed, well below the 6 GiB limit.
Why was the pod killed before reaching its limit?
If a pod is killed because it reaches its memory limit, the description shows Reason: OOMKilled , not Error .
When the host node’s memory is exhausted, Kubernetes triggers a protection mechanism that evicts pods to free resources.
Only this pod was repeatedly evicted while other services were unaffected.
Root Cause
Why my pod gets OOMKill (exit code 137) without reaching threshold of requested memory
The linked article described the same situation: the pod was killed with
Exit Code: 137and
Reason: Error.
The author traced the cause to Kubernetes’ QoS (Quality of Service) mechanism. When a node runs out of resources, Kubernetes evicts pods based on QoS priority.
What is K8s QoS?
QoS classifies pods as BestEffort , Burstable , or Guaranteed according to their resource
requestsand
limits. The classification directly influences eviction decisions when a node is under memory pressure.
When the node’s memory is exhausted, eviction follows the priority BestEffort → Burstable → Guaranteed .
From the pod description we see:
<code>Limits:
cpu: 8
memory: 6Gi
Requests:
cpu: 100m
memory: 512Mi
</code>This matches the Burstable class. Therefore, under node‑memory pressure, Kubernetes prefers to kill Burstable pods before Guaranteed ones.
Eviction priority when QoS is the same
All pods in the cluster were configured as Burstable, so the next factor is the pod’s memory usage relative to its request. The pod that consumes the highest proportion of memory gets the highest
oom_scoreand is evicted first.
If the kubelet can’t reclaim memory before a node experiences OOM, the oom_killer calculates an oom_score based on the percentage of memory it’s using on the node, and then adds the oom_score_adj to get an effective oom_score for each container. It then kills the container with the highest score. This means that containers in low QoS pods that consume a large amount of memory relative to their scheduling requests are killed first.
The
oom_score_adjvalue is derived from the pod’s QoS class.
Summary
Kubernetes node memory was fully utilized, triggering node‑pressure eviction.
Eviction selects the pod with the highest oom_score.
All pods were Burstable with identical memory requests (512 Mi), so the pod that used the most memory consistently received the highest oom_score and was repeatedly killed.
Solution
Expand the node’s memory or add more nodes to avoid memory‑pressure eviction.
For critical services set
requestand
limitto the same value (QoS = Guaranteed) to minimize the chance of being evicted.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.