Cloud Native 6 min read

Understanding CPU Requests and Limits in Kubernetes

This article explains how Kubernetes uses CPU requests and limits to schedule pods, allocate CPU proportionally, calculate minimal request units, and provides practical guidelines for setting appropriate request and limit values based on workload characteristics and monitoring data.

System Architect Go
System Architect Go
System Architect Go
Understanding CPU Requests and Limits in Kubernetes

In Kubernetes you control the amount of CPU a pod can consume by defining two resource specifications: requests and limits . Requests indicate the amount of CPU the scheduler should assume the pod needs, while limits define the hard ceiling of CPU usage.

The scheduler uses requests to decide on which node a pod will be placed, and also to distribute a node's CPU among containers. For example, on a single‑CPU node, if Container A requests 0.1 vCPU and Container B requests 0.2 vCPU, both containers can still use 100 % of the CPU, but the actual allocation will respect the request ratio (1:2), giving Container A ~0.3 vCPU and Container B ~0.6 vCPU when both are fully loaded.

Requests are useful for setting a baseline ("give me at least X CPU") and for establishing relative relationships between pods, but they do not enforce a hard limit. To enforce a cap you must set limits , which are expressed as a period (default 100 000 µs) and a quota (e.g., 10 000 µs), meaning the container may use CPU for 0.01 s every 0.1 s (often written as "100m").

A simple (though not precise) method to calculate a minimal request unit is:

REQUEST = NODE_CORES * 1000 / MAX_NUM_PODS_PER_NODE

For a node with 1 vCPU and a maximum of 10 pods, the minimal unit is 1 * 1000 / 10 = 100Mi . You can assign this unit or multiples of it to your containers.

If you know that Pod A should receive twice the CPU of Pod B, you could set:

Request A: 1 unit

Request B: 2 units

When both pods try to use 100 % CPU, they will be proportionally allocated according to the 1:2 weight.

A more robust approach is to monitor actual CPU usage and set requests based on the average consumption, optionally using the Vertical Pod Autoscaler (VPA) to recommend request values.

When defining limits, consider any inherent hard limits of the application (e.g., a single‑threaded app cannot use more than one core) and set limit = 99th percentile + 30‑50 % of observed usage.

Setting CPU requests is generally regarded as a best practice in Kubernetes because it helps the scheduler place pods efficiently. Setting CPU limits is more controversial, but many practitioners recommend using them to prevent noisy‑neighbor problems.

For further reading see:

learnk8s.io – Setting CPU & Memory Limits & Requests

Understanding Resource Limits in Kubernetes – CPU Time

Docker CPU Resource Limits

cloud nativeKubernetesresource managementCPURequestslimits
System Architect Go
Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.