Cloud Native 12 min read

When Kubernetes CPU Limits Fail: Better Alternatives and Best Practices

This article explains how Kubernetes CPU requests and limits work, why limits can throttle performance, compares language‑specific behaviors, and presents alternative strategies such as relying on requests with Horizontal Pod Autoscaling for more efficient and cost‑effective scaling.

Open Source Linux

May 11, 2023

When Kubernetes CPU Limits Fail: Better Alternatives and Best Practices

Kubernetes CPU Requests

Requests serve two purposes in Kubernetes: they inform the scheduler of the amount of CPU needed for a pod to be placed, and they guarantee that the pod will receive at least that amount of CPU time.

Scheduling Availability

The scheduler filters out nodes that cannot satisfy the requested CPU and memory; if no node has enough resources, the pod remains pending.

Guaranteed CPU Allocation

Once a node is selected, Kubernetes sets Linux CPU shares (cgroup feature) to approximate the requested milli‑CPU (mCPU) value. The share ratio determines each cgroup's priority on the CPU.

CPU shares are the only mechanism that aligns Linux shares with the mCPU metric.

They allow a pod to request up to the number of logical CPUs actually available on the node.

Thus, shares act as a minimum CPU guarantee for containers, though Linux does not enforce this as a hard limit.

If a pod has no CPU request, it is treated as BestEffort QoS and is the first to be evicted under pressure.

Kubernetes CPU Limits

Limits define the maximum CPU a container can consume, but they do not guarantee that amount; they rely on the availability of resources at runtime.

Limits protect other workloads from contention but can cause severe throttling if set too low. Requests use cgroup CPU shares, while limits use cgroup CPU quotas.

Accounting Period : the time window (default 100 ms) before the quota resets.

Quota Period : the amount of CPU time (in microseconds) a cgroup may use during the accounting period (default 100 ms = 1000 mCPU).

For single‑threaded workloads, a limit of 1 CPU yields 100 ms of CPU time every 100 ms. For multithreaded workloads, the same limit can cause heavy throttling because multiple threads compete for the same quota.

Programming Language Landscape

Node.js

Node.js is single‑threaded (unless worker_threads are used). It is a good candidate for scaling out across many pods rather than scaling up a single pod. The default thread pool runs four threads for I/O; the UV_THREADPOOL_SIZE env variable can adjust this.

Python

Python is also typically single‑threaded, making it suitable for pod‑level scaling. Multiprocessing libraries assume the number of physical cores, and there is currently no way to influence this behavior other than configuring the library directly.

Java

The JVM now detects container/cgroup limits automatically on Linux x64, adjusting memory and processor usage. If detection is unavailable, the -XX:ActiveProcessorCount flag can manually set the number of CPU cores.

.NET/C

.NET provides automatic container/cgroup detection similar to Java.

Golang

Set the GOMAXPROCS environment variable to match the CPU limit, or use the automaxprocs package to adjust it automatically.

Do You Really Need Limits?

In many cases, using only requests together with Horizontal Pod Autoscaling (HPA) and Cluster Autoscaler provides sufficient elasticity without the performance penalties of limits.

Alternatives to Limits

When workload increases, HPA adds pods; if the cluster lacks capacity, the Cluster Autoscaler provisions new nodes. When load decreases, HPA scales down pods and the autoscaler can shrink nodes, eliminating the need for hard limits.

Conclusion

Understanding Kubernetes CPU requests and limits lets you configure workloads appropriately: align thread counts with limits, rely on requests for minimum guarantees, and prefer autoscaling mechanisms over restrictive limits for better performance and cost efficiency.

Author: JASON UMIKER Source: https://sysdig.com/blog/kubernetes-cpu-requests-limits-autoscaling/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes autoscaling container orchestration cpu-limits resource requests

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.