Operations 16 min read

Optimize Kubernetes Resource Use with Requests, Limits, and Scheduling

This article explains common causes of resource waste in Kubernetes clusters, such as over‑provisioned requests and fluctuating workloads, and provides practical methods—including proper request/limit settings, ResourceQuota and LimitRange policies, node affinity, taints and tolerations, and HPA—to improve overall resource utilization and cluster stability.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Optimize Kubernetes Resource Use with Requests, Limits, and Scheduling

Improving Resource Utilization

1.1 Resource Waste Scenarios

Over 50% waste due to resource reservation

Kubernetes uses the Request field to reserve CPU and memory for a container, guaranteeing a minimum amount of resources that cannot be preempted by other containers. If the Request is set too low, the workload may lack resources under high load; therefore users often set Requests high to ensure reliability.

In most periods, actual workload load is not high. The diagram below shows a real‑world case where the reserved CPU (Request) far exceeds the actual CPU usage, resulting in wasted resources that cannot be used by other workloads.

To address this, users should set Requests based on actual load and limit unlimited resource requests, referring to Request Quota and Limit Ranges later.

2. Workload peak‑valley patterns cause obvious waste

Most services exhibit peak‑valley patterns (e.g., bus systems busy by day, quiet at night; games peak on Friday evenings, dip on Sunday). Fixed Requests lead to low utilization during valleys.

Dynamic replica scaling (e.g., Kubernetes HPA) can handle these fluctuations.

3. Different workload types have varying resource needs

Online services require high performance during the day, while offline batch jobs can run during valleys. CPU‑intensive workloads consume more CPU; memory‑intensive workloads consume more memory.

Mixing offline and online workloads with appropriate affinity, taints, and tolerations improves overall utilization.

1.2 Methods to Improve Resource Utilization

Two approaches: use native Kubernetes capabilities for manual resource partitioning and limits, and combine business‑specific automation. This section focuses on native Kubernetes methods.

1.2.1 How to Partition and Limit Resources

Imagine you manage a cluster shared by four business units. To improve overall utilization, you need to cap each unit's resource usage and set sensible defaults.

Ideally, each workload sets appropriate Request (minimum guaranteed) and Limit (maximum allowed). In practice, users often forget or set them excessively high.

Example values:

CPU: Request 0.25, Limit 0.5

Memory: Request 256 MiB, Limit 1024 MiB

For finer‑grained control, use namespace‑level ResourceQuota and LimitRange .

1.2.2 Using ResourceQuota

ResourceQuota limits the total resources a namespace can consume (CPU, memory, storage, object counts). It helps isolate projects and prevent a single namespace from exhausting cluster resources.

Compute resources: sum of all container Requests and Limits

Storage resources: total PVC storage requests

Object counts: total number of PVC, Service, ConfigMap, Deployment, etc.

Typical scenarios:

Allocate separate namespaces for different teams and set quotas per namespace

Set upper limits to improve cluster stability and avoid resource hogging

A script (using the

kubectl-view-allocations

plugin) can generate initial ResourceQuota YAML files for each namespace, inflating the current total Request/Limit by 30%.

<code>wget https://my-repo-1255440668.cos.ap-chengdu.myqcloud.com/ops/ResourceQuota-install.tar.gz
tar -xvf ResourceQuota-install.tar.gz
cd ResourceQuota-install && bash install.sh</code>

After execution, a

resourceQuota

directory with

ResourceQuota.yaml

files is created for further adjustment and

kubectl apply

.

Note: If a namespace’s total Request/Limit exceeds its ResourceQuota, new Pods cannot be created. Pods must specify

requests.cpu

,

requests.memory

,

limits.cpu

, and

limits.memory

.

1.2.3 Using LimitRange

LimitRange sets default and min/max values for individual containers within a namespace, preventing users from creating pods with too small or too large resource specifications.

Compute resources: define CPU and memory ranges

Storage resources: define PVC size ranges

Ratio settings: control Request‑to‑Limit ratios

Default values: automatically apply when a pod omits explicit settings

Typical use cases:

Provide default Request/Limit values to avoid user omission and protect QoS

Set different defaults per namespace based on workload characteristics

Enforce upper and lower bounds to keep pods healthy while limiting over‑consumption

1.2.4 Scheduling Strategies

Kubernetes scheduling finds the most suitable node for each Pod. Proper scheduling policies, combined with business characteristics, can greatly improve cluster resource utilization.

1.2.4.1 Node Affinity

If a CPU‑intensive workload lands on a memory‑focused node, CPU resources may be wasted. By labeling nodes (e.g.,

cpu‑intensive=true

) and adding matching affinity rules to Pods, the scheduler places workloads on appropriate nodes, enhancing utilization.

1.2.4.2 Taints and Tolerations

Taints mark nodes as unsuitable for Pods unless the Pod explicitly tolerates the taint. Tolerations allow Pods to run on tainted nodes. This mechanism can be used for node exclusivity, special hardware (e.g., GPUs), or handling node failures.

<code>kubectl taint nodes nodename dedicated=groupName:NoSchedule</code>

Pods with matching tolerations can then be scheduled onto those nodes.

<code>kubectl taint nodes nodename special=true:NoSchedule
kubectl taint nodes nodename special=true:PreferNoSchedule</code>

For node‑failure scenarios, a toleration such as the following lets a Pod survive temporary network partitions:

<code>tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000</code>

Improving cluster stability involves many techniques; resource utilization is just one of them. More methods will be shared in future articles.

kubernetesResource ManagementSchedulingLimitRangeResourceQuotaNode Affinity
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.