Cloud Native 12 min read

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

This article explains three frequent misunderstandings about Kubernetes Horizontal Pod Autoscaler—dead zones, misuse of utilization calculations, and perceived lag in scaling—while detailing HPA’s inner workings, metric sources, calculation methods, and behavior configuration to help you avoid scaling pitfalls.

Efficient Ops

Nov 2, 2022

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

Introduction

One of the advantages of cloud computing is elasticity; in cloud‑native environments Kubernetes provides horizontal pod autoscaling (HPA) that can scale applications based on real‑time metrics.

However, HPA’s actual behavior often differs from intuition, leading to three common misconceptions that EDAS users encounter.

Misconception 1: HPA Has a Scaling Dead Zone

Symptom: When Request equals Limit and the target utilization exceeds 90%, scaling does not occur.

Root Cause: HPA applies a tolerance (default 10%). If the metric change is smaller than this tolerance, the scaling action is ignored. With a target of 90%, actual utilization between 81%‑99% is ignored.

Avoidance Guide: When Request = Limit, avoid setting an overly high target utilization; leave a buffer to handle traffic spikes and account for the inherent scaling delay.

Misconception 2: Misunderstanding Utilization Calculation, Scaling Mismatched with Expected Usage

Symptom: With Limit > Request and a 50% target, scaling occurs before usage reaches 50% of the Limit.

Root Cause: HPA calculates utilization based on Request. When Limit > Request, actual utilization can exceed 100%.

Avoidance Guide: For critical workloads, set Request = Limit to guarantee exclusive resources. For shareable workloads, avoid overly high targets; otherwise, pods may be killed under resource pressure, causing service disruption.

Misconception 3: Scaling Appears Lagging Compared to Expectations

Symptom: During a sudden metric surge, HPA does not scale immediately and may scale in multiple steps, ending with a different replica count than expected.

Root Cause: HPA’s design includes both a behavior policy and a tolerance. The behavior limits scaling rate, preventing instantaneous jumps, while tolerance ignores minor metric fluctuations, which can lead to different final replica counts after multiple scaling cycles.

Avoidance Guide: Understand HPA’s operation and configure a reasonable behavior policy.

HPA Working Mechanism

Before debunking misconceptions, it’s useful to outline HPA’s workflow.

Watch HPA resources; any creation or configuration change triggers the controller.

Fetch metric data from the Metrics API, which can be provided by three types of metric servers:

Kubernetes MetricServer – container‑level CPU/Memory.

Custom MetricServer – external metrics from custom resources.

External MetricServer – metrics from outside the cluster.

Compute the desired replica count for each metric and take the maximum as the overall desired replica count.

Adjust the target workload accordingly.

Steps 2‑4 run roughly every 15 seconds; the interval can be changed via the KCM parameter --horizontal-pod-autoscaler-sync-period.

01 Data Sources

HPA supports five metric sources and three metric server types:

Resource – pod‑level CPU/Memory usage.

ContainerResource – container‑level CPU/Memory usage.

Object – metrics of any Kubernetes resource.

Pods – pod‑related metrics.

External – metrics from outside the cluster.

In self‑managed Kubernetes, these metric servers must be installed separately and run outside the KCM.

02 Metric Calculation Methods

HPA offers three target value types:

Value – absolute total.

AverageValue – total divided by current replica count.

Utilization – average value divided by Request.

Utilization is based on Request, so without a Request defined, HPA may not function correctly.

All metric sources support the AverageValue type.

For a single metric, the desired replica count is calculated with a tolerance range (typically 0.9‑1.1× the target). Small fluctuations within this range are ignored to prevent constant scaling.

When multiple metrics are configured, the final desired replica count is the maximum of the individual calculations.

03 Scaling Behavior

Metrics can exhibit frequent, large fluctuations. To avoid unwanted down‑scaling during such noise, HPA introduces a behavior configuration (available in autoscaling/v2beta2, Kubernetes ≥ 1.18) that controls scaling actions.

Behavior consists of three components:

Stabilization window – looks at recent desired replica counts and selects an extreme value (minimum for scaling up, maximum for scaling down) to ensure stability.

Scale‑up/scale‑down policies – define step size, limits, and periods to bound how quickly replicas can change.

Selection policy – determines whether to take the maximum, minimum, or disable the policy among multiple step‑size rules.

Review and Summary

HPA’s architecture makes it a reactive system; scaling lag is inevitable.

Utilization is calculated against Request, so exceeding 100% is normal; high targets require careful resource planning.

The tolerance concept mitigates metric noise but can create dead zones that operators must avoid.

HPA can use various metric types; appropriate metric servers must be deployed (e.g., EDAS provides microservice RT and QPS metrics).

Even without explicit configuration, HPA applies default scaling behavior; the default up‑scale stabilization window is zero, so setting a short window can filter out spikes.

One HPA can monitor multiple metrics, but avoid attaching multiple HPAs to the same workload to prevent oscillations.

In cloud‑native environments, elasticity options are richer and can be customized to business needs, leveraging PaaS platforms and cloud provider capabilities for cost‑effective scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes autoscaling HPA

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.