Mastering Kubernetes Resource Quotas and Pod Limits to Prevent Cluster Overload
This guide explains why resource limits are essential in Kubernetes, how to configure Namespace‑level ResourceQuota and Pod‑level Requests/Limits, and provides a practical case study with YAML examples to prevent a single service from exhausting cluster CPU and memory.
In Kubernetes, the stability of applications and the efficient utilization of cluster resources are tightly coupled. Without proper control, a single service can consume all CPU and memory, causing other workloads to fail.
Why Apply Resource Limits?
Unrestricted pods may quickly exhaust node resources, leading to OOM kills, node crashes, or service outages during traffic spikes. Resource limits ensure fair distribution and system stability.
Namespace‑Level ResourceQuota
Think of a Namespace as a department; a ResourceQuota acts as the department’s budget, capping total CPU, memory, and pod counts.
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev
spec:
hard:
requests.cpu: "10" # max total CPU requests
requests.memory: 20Gi # max total memory requests
limits.cpu: "20" # max total CPU limits
limits.memory: 40Gi # max total memory limits
pods: "50" # max number of podsRestricts wasteful resource use in development environments.
Guarantees sufficient resources for critical services.
Prevents teams from monopolizing the cluster.
Pod‑Level Requests and Limits
Each Pod must declare the minimum resources it needs (Requests) and the maximum it may use (Limits). The scheduler places the Pod based on Requests, while Limits act as a ceiling that triggers throttling or OOM termination when exceeded.
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
containers:
- name: web
image: nginx
resources:
requests:
cpu: "500m" # 0.5 core requested
memory: "512Mi" # 512 MiB requested
limits:
cpu: "1" # up to 1 core
memory: "1Gi" # up to 1 GiBScheduler uses Requests to select a suitable node.
Limits enforce a hard cap; exceeding CPU triggers throttling, exceeding memory triggers OOMKilled.
Practical Case: Stopping a “Resource‑Hungry Snake”
A team deployed a Node.js service without Limits; under high concurrency it consumed 8 GB of memory and crashed the node.
Set a ResourceQuota for the Namespace to cap total memory at 16 Gi.
Define requests.memory=512Mi and limits.memory=1Gi in the Pod spec.
Combine with Horizontal Pod Autoscaler (HPA) for automatic scaling.
These steps prevent a single container from running away, keep other services healthy, and enable scaling based on demand rather than uncontrolled resource grabs.
Conclusion
Namespace ResourceQuota controls resource distribution at the organizational level.
Pod Requests/Limits safeguard stability at the application level.
Pairing them with HPA and proper scheduling achieves efficient, reliable cluster utilization.
Best practice: first define departmental budgets (ResourceQuota), then set application‑level caps (Requests/Limits), and finally enable automated scaling.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
