Mastering Kubernetes Pod Resource Requests, Limits, and QoS
Learn how to properly configure CPU and Memory requests and limits for Kubernetes Pods, understand QoS classes, manage namespace quotas with LimitRange and ResourceQuota, and monitor resource usage using Prometheus queries and Grafana dashboards to ensure stable, efficient cluster operations.
1. Overview
Explain the importance of CPU Request and Memory Request for Pods. If not set, Kubernetes assumes low resource needs and may schedule the Pod anywhere, which can cause resource starvation when the cluster is under pressure.
When resources become scarce, the node may evict Pods; critical Pods (e.g., data storage, login, balance query) must be protected. Kubernetes achieves this through resource quotas, over‑provisioning, and QoS classes that give higher‑priority Pods better guarantees.
The cluster’s compute resources include CPU, GPU, and Memory, but most workloads only need CPU and Memory, which are the focus here.
CPU and Memory are specified per container via
resources.requestsand
resources.limits. The scheduler uses the request values to find a node with sufficient capacity; if none exists, scheduling fails.
2. Pod Resource Usage Guidelines
Pod CPU and Memory usage is dynamic and depends on load; it is expressed as a range (e.g., 0.1‑1 CPU, 500 Mi‑1 Gi memory). The two key parameters are:
Requests : the amount of resources reserved for normal operation.
Limits : the maximum resources a container may consume; for CPU this is a compressible resource, for Memory it is a hard limit.
If a Memory limit is set too low, the container can be killed when it exceeds the limit. Conversely, omitting limits makes the Pod’s usage elastic but less predictable.
When many Pods exist, manually setting all four parameters (CPU request/limit, Memory request/limit) is impractical. Kubernetes provides
LimitRangeto supply default and allowed values, and
ResourceQuotato cap total usage per namespace.
CPU Rules
Units are millicores (m), where 10 m = 0.01 core and 1 core = 1000 m.
Requests are estimated from actual business usage.
Limits are calculated as
Requests * 1.2(i.e., 20 % overhead).
Memory Rules
Units are mebibytes (Mi), where 1024 Mi = 1 Gi.
Requests are estimated from actual usage.
Limits follow the same 20 % overhead formula:
Requests * 1.2.
3. Namespace Resource Management Standards
Business requests and limits should not exceed 80 % of the total namespace quota to leave headroom for rolling updates.
3.1 Multi‑Tenant Resource Strategy
Use
ResourceQuotato limit the resources a project team can consume.
3.2 Resource Change Process
4. Resource Monitoring and Inspection
4.1 Resource Usage Monitoring
Namespace Requests usage rate
<code>sum (kube_resourcequota{type="used",resource="requests.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.cpu"}) by (resource,namespace) * 100
sum (kube_resourcequota{type="used",resource="requests.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.memory"}) by (resource,namespace) * 100</code>Namespace Limits usage rate
<code>sum (kube_resourcequota{type="used",resource="limits.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.cpu"}) by (resource,namespace) * 100
sum (kube_resourcequota{type="used",resource="limits.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.memory"}) by (resource,namespace) * 100</code>4.2 Viewing via Grafana
CPU request rate
<code>sum (kube_resourcequota{type="used",resource="requests.cpu",namespace=~"$NameSpace"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.cpu",namespace=~"$NameSpace"}) by (resource,namespace)</code>Memory request rate
<code>sum (kube_resourcequota{type="used",resource="requests.memory",namespace=~"$NameSpace"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.memory",namespace=~"$NameSpace"}) by (resource,namespace)</code>CPU limit rate
<code>sum (kube_resourcequota{type="used",resource="limits.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.cpu"}) by (resource,namespace)</code>Memory limit rate
<code>sum (kube_resourcequota{type="used",resource="limits.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.memory"}) by (resource,namespace)</code>4.3 In‑Cluster Resource Inspection
Check resource usage
<code>[root@k8s-dev-slave04 yaml]# kubectl describe resourcequotas -n cloudchain--staging
Name: mem-cpu-demo
Namespace: cloudchain--staging
Resource Used Hard
-------- ---- ----
limits.cpu 200m 500m
limits.memory 200Mi 500Mi
requests.cpu 150m 250m
requests.memory 150Mi 250Mi</code>Check events for quota violations
<code>[root@kevin ~]# kubectl get event -n default
LAST SEEN TYPE REASON OBJECT MESSAGE
46m Warning FailedCreate replicaset/hpatest-57965d8c84 Error creating: pods "hpatest-57965d8c84-s78x6" is forbidden: exceeded quota: mem-cpu-demo, requested: limits.cpu=400m,limits.memory=400Mi, used: limits.cpu=200m,limits.memory=200Mi, limited: limits.cpu=500m,limits.memory=500Mi
... (additional similar lines) ...</code>Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.