Operations 9 min read

Why Does My Kubernetes Pod Get OOMKilled Before Reaching Its Memory Limit?

A pod in a Kubernetes cluster repeatedly restarted with exit code 137 despite staying well below its 6 Gi memory limit, prompting an investigation that uncovered the role of QoS classes, oom_score calculations, and node‑level memory pressure in the eviction process.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why Does My Kubernetes Pod Get OOMKilled Before Reaching Its Memory Limit?

Problem Description

During a single day a pod restarted eight times. The pod’s Exit Code was 137 and the Reason field showed Error . The JVM logs contained no errors, yet the process was killed. The pod had a memory limit of 6 Gi (JVM -Xmx/-Xms set to 3 Gi) and the node’s memory usage was already at 99 %.

Investigation Process

Initial Findings

Containers:
  container-prod--:
    Container ID: --
    Image: --
    Port: 8080/TCP
    State: Running
    Started: Fri, 05 Jan 2024 11:40:01 +0800
    Last State: Terminated
    Reason: Error
    Exit Code: 137
    Restart Count: 8
    Limits:
      cpu: 8
      memory: 6Gi
    Requests:
      cpu: 100m
      memory: 512Mi

Exit Code 137 means the process received SIGKILL, typically caused by an OOM kill.

The immediate hypothesis was that the pod exceeded its memory limit.

Further Analysis

The node could not accept a larger pod memory limit because its own memory was almost exhausted. Monitoring showed the pod’s memory usage peaked at ~4 Gi, well below the 6 Gi limit, yet the pod was still terminated.

Key observations:

If a pod is killed because it exceeds its limit, the Reason field is OOMKilled , not Error .

When a node runs out of memory, Kubernetes evicts pods based on Quality‑of‑Service (QoS) priority.

Only this pod was repeatedly evicted while other services remained running.

Root‑Cause Discovery

The issue was traced to Kubernetes’ QoS classification and the oom_score calculation used by the kubelet’s OOM killer.

Understanding Kubernetes QoS

Kubernetes assigns each pod to one of three QoS classes based on the relationship between requests and limits:

Guaranteed : Every container sets identical CPU and memory request and limit values.

Burstable : At least one container defines a request or limit, but the pod does not satisfy the Guaranteed criteria.

BestEffort : No container defines any request or limit.

When a node is under memory pressure, eviction proceeds in the order BestEffort → Burstable → Guaranteed . Within the same QoS class, pods are ordered by their effective oom_score:

oom_score = (memory_used / node_memory) * 1000 + oom_score_adj

The adjustment value oom_score_adj is derived from the pod’s QoS:

Guaranteed: -997

BestEffort: 1000

Burstable: min(max(2, 1000 - (1000 × memoryRequestBytes) / machineMemoryCapacityBytes), 999)

Applying QoS to the Problem Pod

The pod specification was:

Limits:
  cpu: 8
  memory: 6Gi
Requests:
  cpu: 100m
  memory: 512Mi

Because the memory request (512 Mi) is far lower than the limit (6 Gi), the pod belongs to the Burstable class. All other services on the node were configured as Guaranteed , giving them a much lower oom_score_adj. Consequently, when the node’s memory reached 99 % usage, the kubelet selected the Burstable pod with the highest oom_score for eviction, even though its actual usage (≈4 Gi) was below its limit.

Eviction Priority Among Burstable Pods

In this cluster every Burstable pod used the same memory request (512 Mi). The kubelet therefore compared the ratio

memory_used / memory_request</strong>. The pod that consumed the largest proportion of memory (the problematic service) obtained the highest <code>oom_score

and was evicted repeatedly.

Resolution

Increase the node’s physical memory or reduce overall node memory pressure.

For critical workloads, set requests and limits to identical values so the pod becomes Guaranteed , which assigns an oom_score_adj of -997 and dramatically lowers eviction likelihood.

Key Takeaways

Kubernetes may kill a pod before it reaches its declared memory limit when the node is under memory pressure and the pod’s QoS class is lower than that of competing pods. Understanding QoS classification, the oom_score formula, and the impact of oom_score_adj is essential for designing resilient pod resource specifications and avoiding unexpected restarts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsKubernetesQoSpod evictionOOMKill
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.