Cloud Native 9 min read

Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

Horizontal Pod Autoscaler (HPA) in Kubernetes automatically scales pod replicas based on metrics like CPU, memory, or custom indicators, and this guide explains its core principles, configuration pitfalls, step‑by‑step troubleshooting commands, and advanced considerations such as API versions, stabilization windows, and integration with Cluster Autoscaler.

Ray's Galactic Tech
Ray's Galactic Tech
Ray's Galactic Tech
Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

What is the Horizontal Pod Autoscaler (HPA)

The HPA is a Kubernetes controller that automatically adjusts the number of pod replicas based on observed metrics (CPU, memory, or custom metrics). Every 15 seconds it evaluates the current load and computes the desired replica count with the formula:

desiredReplicas = ceil[currentReplicas × (currentMetricValue ÷ targetValue]

The result is clamped between minReplicas and maxReplicas. HPA performs horizontal scaling only (adds or removes pods) and relies on the metrics-server or a custom metrics adapter such as Prometheus.

Why HPA May Appear Configured but Not Scale

1. Verify HPA status and events

kubectl get hpa <HPA_NAME> -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE>

Check that the target Deployment or StatefulSet is correct, that the metrics source is reachable, and that the Events section does not contain errors such as “cannot fetch metrics” or “calculation failed”.

2. Ensure the Metrics API is healthy

kubectl get apiservices | grep metrics
kubectl get pods -n kube-system | grep metrics-server
kubectl logs -f -n kube-system <METRICS_SERVER_POD>

For custom metrics you can query the API directly:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

Typical failures include certificate errors, kubelet access problems, or mis‑configured adapters.

3. Pods must define resources.requests

The HPA algorithm uses the request values to calculate percentages. Without them the controller cannot determine a target utilization.

resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 1Gi

4. Adjust the stabilization window (scale‑down delay)

By default HPA waits 5 minutes before scaling down to avoid thrashing. The window can be shortened:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 60

5. Check replica boundaries

If the current replica count has reached maxReplicas, no further scale‑up will occur.

If it has reached minReplicas, scaling down stops.

6. Confirm the Deployment can actually create pods

Even when HPA issues a scale‑up, the Deployment may fail due to image‑pull errors, insufficient node resources, or other pod‑creation problems.

kubectl describe deployment <DEPLOYMENT_NAME>

Advanced Considerations

API version : autoscaling/v1 supports only CPU. autoscaling/v2 (GA from Kubernetes 1.23) supports CPU, memory, and custom metrics with multiple scaling policies. Prefer v2.

Metrics‑server sampling : It collects data every 1 minute, while HPA polls every 15 seconds. Metric values may be up to a minute stale.

Load‑testing tools : Simple busy‑loop scripts often do not generate measurable CPU load. Use a proper stress container, e.g.

kubectl run stress --image=alpine/stress -- stress --cpu 2 --timeout 300s

, or HTTP load generators such as hey or ab to generate realistic traffic.

Cooling mechanism : Scale‑up is immediate; scale‑down respects the stabilization window (default 5 minutes).

Node resource shortage : HPA can increase pod count, but if the cluster lacks free nodes the pods remain Pending. Combine HPA with the Cluster Autoscaler to add nodes automatically.

Application suitability : HPA works best for stateless services (web, API). Stateful workloads (databases, caches) may need additional architectural support.

Metric selection : Choose metrics that reflect actual load—CPU/Memory for compute‑intensive workloads, QPS/latency for web services, queue length for background job processors. Incorrect metrics lead to ineffective scaling.

Production Checklist

Run kubectl get hpa and kubectl describe hpa; inspect the Events for errors.

Verify that metrics-server (or a custom adapter) is healthy and its pods are running.

Ensure every pod template defines resources.requests for CPU (and memory if used).

Confirm the current replica count is not already at maxReplicas or minReplicas.

Generate real load with a load‑testing tool to validate that metrics increase.

Allow the stabilization window before expecting a scale‑down.

Check that the Deployment can start pods without image‑pull or resource errors.

Make sure the cluster has enough nodes; enable Cluster Autoscaler if necessary.

Use autoscaling/v2 to avoid the limitations of v1.

Select appropriate scaling metrics (CPU, memory, QPS, queue length, etc.).

Summary

HPA is a metric‑driven horizontal scaling controller that depends on metrics-server or custom adapters. Common reasons for non‑functioning HPA include missing or mis‑configured metrics‑server, pods without resources.requests, low metric values, replica limits, the default stabilization window, or Deployment failures. Advanced issues involve API version choice, metric latency, node scarcity (requiring Cluster Autoscaler), and selecting suitable metrics for the workload type.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeKubernetesautoscalingtroubleshootingHPAmetrics-server
Ray's Galactic Tech
Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.