Operations 10 min read

Why Does My Kubernetes Pod Get OOMKilled Before Hitting Memory Limits?

This article explains why a Kubernetes pod can be OOMKilled and repeatedly restarted even before reaching its memory limit, detailing the role of QoS classes, oom_score calculations, and node‑memory pressure in the eviction process, and offers practical mitigation steps.

MaGe Linux Operations

Jan 17, 2024

Why Does My Kubernetes Pod Get OOMKilled Before Hitting Memory Limits?

Problem Description

In the afternoon the operations team reported that a pod restarted eight times in a day. The JVM logs showed no errors, so the process was being killed directly. The initial guess was an OOMKill because the pod hit its memory limit.

Both Xmx and Xms were set to 3 GiB, while the pod’s memory limit was 6 GiB, making the situation puzzling.

Investigation Process

Initial定位

The pod description retrieved from the operations team is shown below.

Containers:
  container-prod--:
    Container ID:   --
    Image:          --
    Image ID:       docker-pullable://--
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 05 Jan 2024 11:40:01 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 05 Jan 2024 11:27:38 +0800
      Finished:     Fri, 05 Jan 2024 11:39:58 +0800
    Ready:          True
    Restart Count:  8
    Limits:
      cpu:     8
      memory:  6Gi
    Requests:
      cpu:        100m
      memory:     512Mi

The "Last State: Terminated" with "Exit Code: 137" indicates the pod was killed by SIGKILL, typically because it exceeded its memory limit and was OOMKilled. The immediate remedy was to increase the pod’s memory limit to keep the service stable, then investigate why the pod consumed so much off‑heap memory.

Further Analysis

The operations team said the node’s memory was already at 99 %, so they could not increase the pod’s limit. Monitoring showed the pod’s memory usage was only around 4 GiB before being killed, far below the 6 GiB limit.

This raised the question: why was the pod killed before reaching its limit?

If a pod is killed for exceeding its limit, the Reason field is "OOMKilled" rather than "Error".

When the node’s memory is exhausted, Kubernetes evicts pods to free resources.

Why was only this pod repeatedly evicted while other services were unaffected?

Puzzle Solved

Google search led to an article titled “Why my pod gets OOMKill (exit code 137) without reaching threshold of requested memory”. The author experienced the same situation.

Last State:     Terminated
Reason:       Error
Exit Code:    137

The root cause was Kubernetes’ QoS mechanism: when the node runs out of memory, pods are killed according to QoS priority.

What is k8s QoS?

QoS (Quality of Service) classifies pods based on their resource requests and limits, influencing eviction decisions when a node is under memory pressure.

QoS

Condition

Guaranteed

All containers set identical cpu and memory request and limit.

Burstable

Pod does not meet Guaranteed conditions and at least one container has a request or limit.

BestEffort

No container sets any request or limit.

When a node’s resources are exhausted, eviction follows the order BestEffort → Burstable → Guaranteed.

From the pod description we see the limits are 8 cpu / 6 Gi memory and requests are 100 m cpu / 512 Mi memory, which matches the Burstable class.

Limits:
  cpu:     8
  memory:  6Gi
Requests:
  cpu:        100m
  memory:     512Mi

Therefore, under node‑memory pressure, a Burstable pod is more likely to be evicted than a Guaranteed one, even if it has not reached its limit.

Eviction priority when QoS is the same

All pods in the cluster were Burstable, so the next factor is the oom_score calculated for each container.

If the kubelet can't reclaim memory before a node experiences OOM, the oom_killer calculates an oom_score based on the percentage of memory the container uses on the node, adds the oom_score_adj , and kills the container with the highest score. This means containers in low QoS pods that consume a large amount of memory relative to their scheduling requests are killed first.

The oom_score is the sum of the pod’s memory‑usage‑percentage and its oom_score_adj, which is derived from the QoS class:

QoS

oom_score_adj

Guaranteed

-997

BestEffort

1000

Burstable

min(max(2, 1000 - (1000 × memoryRequestBytes) / machineMemoryCapacityBytes), 999)

Consequently, among Burstable pods the one with the highest memory‑usage‑to‑request ratio gets the highest oom_score and is evicted first.

Summary

The pod was repeatedly restarted because the node’s memory was fully consumed, triggering a node‑pressure eviction. Kubernetes selected the pod with the highest oom_score—this Burstable pod with a 512 Mi request but higher actual memory usage.

Expand the node’s memory to avoid eviction due to resource exhaustion.

For critical services, set request and limit to identical values so the pod’s QoS becomes Guaranteed, reducing the chance of being killed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

QoS Pod eviction OOMKill

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.