How Kubernetes Resource Limits Work: From CPU Time to Throttling Metrics
This article explains the mechanics of Kubernetes CPU resource limits, how to interpret limits as time slices, the Linux accounting system behind them, and which Prometheus metrics can be used to set proper limits and diagnose CPU throttling issues.
1. Understanding Limits
When configuring limits, you tell the Linux node how long a container may run in a given period, protecting other workloads from excessive CPU consumption. The limit’s "core" is not a physical core but the total CPU time allocated to a set of processes or threads before the container is paused.
Kubernetes scheduler uses physical cores for scheduling, but the container runtime should treat the limit’s CPU as CPU time.
2. Limits as Time
Consider a single‑threaded app that needs 1 second of CPU time per transaction. Setting
resources:
limits:
cpu: 1000mgives the app 1000 ms (1 CPU‑second) per period, allowing it to run a full transaction without throttling.
Because 1000 ms equals 1 CPU‑second, the app can execute one full CPU‑second each second. This CPU‑second is called a "period" and is used to measure time blocks.
3. Linux Accounting System
Limits are implemented via an accounting system that tracks the total vCPU time a container uses in a fixed period. The Linux kernel splits each period into 20 slices by default.
For example, a half‑period allocation uses 10 slices; the accounting system resets slices after each period.
The kernel parameter cpu.cfs_period_us defines the period length in microseconds, while cpu.cfs_quota_us defines the allowed CPU time within that period. These values are exposed to Prometheus via cAdvisor.
4. Multi‑Threaded Containers
Containers often run hundreds of threads. The accounting system globally tracks which vCPU each thread consumes and adds the usage to the container’s ledger.
Metrics such as container_cpu_usage_seconds_total show the total vCPU seconds used by a container’s threads. If total usage is less than 1 vCPU‑second, the remaining time slice is throttled.
5. Global Accounting
When a CPU needs to run a thread, it first checks whether the container’s global quota has at least a 5 ms slice. If not, the thread is throttled until the next period.
6. Real‑World Scenario
Assume four threads each need 100 ms of CPU time per task, totaling 400 ms (4000 milliCPU). Setting a limit of 400 m (40 % of a full period) would throttle the workload, causing latency to double and leaving vCPU idle.
7. Common Metrics for Limits
container_cpu_cfs_throttled_periods_totalshows throttled periods, while container_cpu_cfs_periods_total shows total periods. In the example, two‑thirds of periods were throttled.
8. Determining Needed Limits
The metric container_cpu_cfs_throttled_seconds_total accumulates throttled 5 ms slices. Dividing its value by 10 converts it to periods (each 100 ms). To increase limits, multiply the desired period increase by 10 (e.g., 200 ms × 10 = 2000 m).
topk(3, max by (pod, container)(rate(container_cpu_cfs_throttled_seconds_total{image!="",instance="$instance"}[$__rate_interval]))) / 109. Alerting
Alerts can be based on CPU throttling time or throttling ratio, for example:
# Alert when throttling time exceeds 1 s
rate(container_cpu_cfs_throttled_seconds_total{namespace=~"wordpress-.*"}[1m]) > 1
# Alert when throttled periods exceed 50 % of total
sum(increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) by (container,pod,namespace) / sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container,pod,namespace) * 100 > 5010. Summary
The article explains how Kubernetes limits work, which metrics can be used to set appropriate values, and how to diagnose throttling problems. Over‑provisioning limits can lead to idle vCPU and increased latency, while realistic limits based on workload characteristics improve resource utilization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
