Cloud Native 16 min read

Why Kubernetes HPA Ignores High CPU Usage and How Tolerance Affects Scaling

This article explains the internal architecture and source‑code flow of Kubernetes Horizontal Pod Autoscaler, detailing how components like HorizontalController and ReplicaCalculator compute desired replicas, why a default 10% tolerance can prevent scaling even when CPU exceeds the target, and how behavior policies and scaling limits influence HPA decisions.

Ops Development Stories

Sep 4, 2025

Why Kubernetes HPA Ignores High CPU Usage and How Tolerance Affects Scaling

Overview of HPA

Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas based on observed CPU utilization or custom metrics. When the utilization exceeds the defined target, HPA should scale out, but sometimes it does not.

Architecture and Core Components

The implementation resides in k8s.io/kubernetes/pkg/controller/podautoscaler and consists of four main components: HorizontalController: the main controller that watches HPA and Pod resources and coordinates scaling. ReplicaCalculator: contains the core logic for calculating the desired replica count. MetricsClient: fetches metric data such as CPU, memory, or custom metrics. ScaleClient: updates the replica count of workloads like Deployments or ReplicaSets.

Source Entry – HPA Controller Startup

The HPA controller is initialized during the startup of cmd/kube-controller-manager. In controllermanager.go the Run() function calls NewControllerDescriptors(), which registers the HPA controller descriptor. The actual start occurs in autoscaling.go via startHPAControllerWithMetricsClient():

func NewControllerDescriptors() map[string]*ControllerDescriptor { ... register(newHorizontalPodAutoscalerControllerDescriptor()) ... }

func newHorizontalPodAutoscalerControllerDescriptor() *ControllerDescriptor { return &ControllerDescriptor{ name: names.HorizontalPodAutoscalerController, initFunc: startHorizontalPodAutoscalerControllerWithRESTClient, } }

func startHorizontalPodAutoscalerControllerWithRESTClient(ctx context.Context, controllerContext ControllerContext, controllerName string) (controller.Interface, bool, error) { return startHPAControllerWithMetricsClient(ctx, controllerContext, metricsClient) }

Core Logic of the Controller

The main reconciliation flow is located in k8s.io/kubernetes/pkg/controller/podautoscaler and follows the chain:

Run() -> worker() -> processNextWorkItem() -> reconcileKey() -> reconcileAutoscaler()

reconcileAutoscaler

performs the following steps:

Record reconciliation metrics with a.monitor.ObserveReconciliationResult.

Deep‑copy HPA objects to avoid mutating shared cache.

Parse the target API version and obtain REST mappings.

Retrieve the Scale sub‑resource for the target workload.

Compute the desired replica count from metrics via computeReplicasForMetrics.

Apply tolerance checks and, if needed, normalize the replica count using either normalizeDesiredReplicas or normalizeDesiredReplicasWithBehaviors.

Update the Scale resource with retry logic ( retry.RetryOnConflict).

Record status conditions and events.

Metric Calculation and Tolerance

computeReplicasForMetrics

validates selectors, extracts current and desired replica numbers, and iterates over each metric specification:

for i, metricSpec := range metricSpecs { replicaCountProposal, metricNameProposal, timestampProposal, condition, err := a.computeReplicasForMetric(ctx, hpa, metricSpec, specReplicas, statusReplicas, selector, &statuses[i]) ... }

For object metrics the controller calls a.computeStatusForObjectMetric, which builds a MetricStatus and then uses a.tolerancesForHpa to obtain the configured tolerance (default 0.1, i.e., ±10%). If the usage ratio falls within this range, no scaling occurs.

// usageRatio = actual / target
if tolerances.isWithin(usageRatio) { return currentReplicas, timestamp, nil }

When the usage ratio exceeds the tolerance, the replica count is recomputed. For absolute targets the formula is:

usageRatio := float64(usage) / float64(targetUsage)
replicaCountFloat := usageRatio * float64(readyPodCount)
replicaCount = int32(math.Ceil(replicaCountFloat))

For average‑value targets the controller uses:

usageRatio := float64(usage) / (float64(targetAverageUsage) * float64(replicaCount))
if !tolerances.isWithin(usageRatio) { replicaCount = int32(math.Ceil(float64(usage) / float64(targetAverageUsage))) }

Normalization and Constraints

The functions normalizeDesiredReplicas and normalizeDesiredReplicasWithBehaviors enforce limits such as minimum/maximum replicas, scaling rate caps, and custom behavior policies. The scaling‑up limit is calculated as:

func calculateScaleUpLimit(currentReplicas int32) int32 { return int32(math.Max(scaleUpLimitFactor*float64(currentReplicas), scaleUpLimitMinimum)) }

With scaleUpLimitFactor = 2.0 and scaleUpLimitMinimum = 4, the controller caps the new replica count to the larger of twice the current replicas or four.

Practical Recommendations

To effectively use HPA:

Monitor HPA status with kubectl describe hpa and watch the Conditions and Current Metrics fields.

Set realistic target utilization values (e.g., ≤75%).

Configure spec.behavior to fine‑tune scaling cadence.

Correlate logs, events, and status conditions ( ScalingActive, ScalingLimited) to diagnose why scaling may be suppressed.

cloud-native Kubernetes HPA Horizontal Pod Autoscaler tolerance

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.