Understanding Kubernetes CPU Requests vs Limits: The Secrets of Overselling
This article explains how Kubernetes uses CPU requests and limits to implement overselling, detailing the underlying Linux cgroup mechanisms, bandwidth throttling, weight‑based scheduling, and practical configuration tips for SREs to balance guaranteed resources with maximum usage.
1. Container Cloud Technology Stack
Kubernetes (K8s) is the dominant container orchestration platform, supporting Docker and other engines. The stack relies on Linux kernel features such as cgroups, namespaces, and rootfs, with cgroups handling CPU resource limits.
2. Container CPU Resource Control
2.1 CPU Bandwidth Limiting
CPU limits are enforced via the cpu cgroup using cpu.cfs_period_us (time period) and cpu.cfs_quota_us (allowed CPU time). Example for limiting a process to two cores:
# cd /sys/fs/cgroup/cpu,cpuacct
# mkdir test
# cd test
# echo 100000 > cpu.cfs_period_us // 100 ms
# echo 200000 > cpu.cfs_quota_us // 200 ms
# echo $pid > cgroup.procsThis configuration allows 200 ms of CPU time every 100 ms, effectively capping usage at two cores. The kernel scheduler uses a periodic timer ( period_timer) to refill the runtime quota:
sched_cfs_period_timer
-> do_sched_cfs_period_timer
-> __refill_cfs_bandwidth_runtimeThe refill function sets cfs_b->runtime = cfs_b->quota, granting the configured CPU time for the next period.
When a task group exhausts its runtime, the scheduler invokes throttle_cfs_rq, removing the task group from the run queue until more time is allocated.
2.2 CPU Weight Allocation
Beyond bandwidth limiting, the kernel can allocate CPU proportionally using weights. In cgroup v1 this is set via cpu.shares, and in cgroup v2 via cpu.weight or cpu.weight.nice. The weight is stored in the task group's scheduling entity:
struct task_group {
...
unsigned long shares;
};
struct sched_entity {
struct load_weight load;
...
};
struct load_weight {
unsigned long weight;
};The fair scheduler scales each entity's virtual runtime ( vruntime) by its weight:
vruntime = (actual_runtime * ((NICE_0_LOAD * 2^32) / weight)) >> 32Higher weight yields smaller vruntime, granting more CPU time. Example: on an 8‑core machine with total weight 8192, a container with weight 512 receives 0.5 core, weight 1024 receives 1 core, and weight 2048 receives 2 cores.
3. Requests and Limits Semantics in Kubernetes
A typical pod spec defines resources as follows:
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
namespace: cpu-example
spec:
containers:
- name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "1"
requests:
cpu: "0.5"Limits translate to cgroup period + quota, setting an upper bound on CPU time. Kubernetes does not enforce that the sum of all containers' limits on a node stays within the node’s core count, allowing overselling.
Requests are implemented via cgroup weight, guaranteeing a minimum share of CPU. In cgroup v1 each requested core maps to a weight of 1024; in cgroup v2 each core maps to roughly 39. The scheduler ensures the sum of all requests on a node does not exceed the total logical cores.
Thus a container’s usable CPU range is [requests, limits]. For example, on an 8‑core node with four identical containers each requesting 2 cores (weight 2048) and limiting to 3 cores (quota 300 ms per 100 ms), the total requests equal 8 cores while total limits equal 12 cores, achieving a 1.5× oversell.
4. Summary
1) Kubernetes uses both limits (hard cap) and requests (guaranteed weight) to efficiently allocate CPU resources and enable overselling.
2) When a pod requests an 8‑core container, the “8 cores” refers to the limits.cpu value; the actual execution may run on any physical core.
3) As an SRE configuring oversell, ensure the sum of requests on a node does not exceed its core count, and set limits to a multiple (e.g., 1.5×) of requests to achieve the desired oversell ratio.
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
namespace: cpu-example
spec:
containers:
- name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "12"
requests:
cpu: "8"This pod appears as a 12‑core container to users while internally it is oversold at a 1.5× factor.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
