Cloud Native 9 min read

Why Is My Kubernetes Pod OOMKilled Before Reaching Its Memory Limit?

A Kubernetes pod repeatedly restarted with exit code 137 despite not hitting its memory limit, revealing that node‑level memory pressure and QoS‑based eviction caused the pod to be killed, and outlining how to diagnose and prevent such OOMKill events.

Efficient Ops
Efficient Ops
Efficient Ops
Why Is My Kubernetes Pod OOMKilled Before Reaching Its Memory Limit?

Problem Description

One afternoon the operations team reported that a pod restarted eight times in a day. Kibana logs showed no JVM errors; the process was simply killed. The initial hypothesis was that the pod hit its memory limit and was OOM‑killed by Kubernetes.

<code>Containers:
  container-prod--:
    Container ID:   --
    Image:          --
    Image ID:       docker-pullable://--
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 05 Jan 2024 11:40:01 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 05 Jan 2024 11:27:38 +0800
      Finished:     Fri, 05 Jan 2024 11:39:58 +0800
    Ready:          True
    Restart Count: 8
    Limits:
      cpu:     8
      memory:  6Gi
    Requests:
      cpu:        100m
      memory:     512Mi
</code>

The Last State shows Exit Code: 137 , which indicates the pod process was killed by

SIGKILL

. Usually this happens when the pod exceeds its memory limit.

The immediate fix was to increase the pod’s memory limit in production, then investigate why the pod consumed so much off‑heap memory.

Further Analysis

The operations team later reported that the node’s memory was already at 99%, so the pod’s memory limit could not be increased.

Monitoring showed the pod’s memory usage was only about 4 GiB before it was killed, well below the 6 GiB limit.

Why was the pod killed before reaching its limit?

If a pod is killed because it reaches its memory limit, the description shows Reason: OOMKilled , not Error .

When the host node’s memory is exhausted, Kubernetes triggers a protection mechanism that evicts pods to free resources.

Only this pod was repeatedly evicted while other services were unaffected.

Root Cause

Why my pod gets OOMKill (exit code 137) without reaching threshold of requested memory

The linked article described the same situation: the pod was killed with

Exit Code: 137

and

Reason: Error

.

The author traced the cause to Kubernetes’ QoS (Quality of Service) mechanism. When a node runs out of resources, Kubernetes evicts pods based on QoS priority.

What is K8s QoS?

QoS classifies pods as BestEffort , Burstable , or Guaranteed according to their resource

requests

and

limits

. The classification directly influences eviction decisions when a node is under memory pressure.

When the node’s memory is exhausted, eviction follows the priority BestEffort → Burstable → Guaranteed .

From the pod description we see:

<code>Limits:
  cpu:     8
  memory:  6Gi
Requests:
  cpu:        100m
  memory:     512Mi
</code>

This matches the Burstable class. Therefore, under node‑memory pressure, Kubernetes prefers to kill Burstable pods before Guaranteed ones.

Eviction priority when QoS is the same

All pods in the cluster were configured as Burstable, so the next factor is the pod’s memory usage relative to its request. The pod that consumes the highest proportion of memory gets the highest

oom_score

and is evicted first.

If the kubelet can’t reclaim memory before a node experiences OOM, the oom_killer calculates an oom_score based on the percentage of memory it’s using on the node, and then adds the oom_score_adj to get an effective oom_score for each container. It then kills the container with the highest score. This means that containers in low QoS pods that consume a large amount of memory relative to their scheduling requests are killed first.

The

oom_score_adj

value is derived from the pod’s QoS class.

Summary

Kubernetes node memory was fully utilized, triggering node‑pressure eviction.

Eviction selects the pod with the highest oom_score.

All pods were Burstable with identical memory requests (512 Mi), so the pod that used the most memory consistently received the highest oom_score and was repeatedly killed.

Solution

Expand the node’s memory or add more nodes to avoid memory‑pressure eviction.

For critical services set

request

and

limit

to the same value (QoS = Guaranteed) to minimize the chance of being evicted.

cloud-nativeoperationskubernetesQoSPod EvictionOOMKill
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.