Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting
Horizontal Pod Autoscaler (HPA) in Kubernetes automatically scales pod replicas based on metrics like CPU, memory, or custom indicators, and this guide explains its core principles, configuration pitfalls, step‑by‑step troubleshooting commands, and advanced considerations such as API versions, stabilization windows, and integration with Cluster Autoscaler.
What is the Horizontal Pod Autoscaler (HPA)
The HPA is a Kubernetes controller that automatically adjusts the number of pod replicas based on observed metrics (CPU, memory, or custom metrics). Every 15 seconds it evaluates the current load and computes the desired replica count with the formula:
desiredReplicas = ceil[currentReplicas × (currentMetricValue ÷ targetValue]The result is clamped between minReplicas and maxReplicas. HPA performs horizontal scaling only (adds or removes pods) and relies on the metrics-server or a custom metrics adapter such as Prometheus.
Why HPA May Appear Configured but Not Scale
1. Verify HPA status and events
kubectl get hpa <HPA_NAME> -n <NAMESPACE> kubectl describe hpa <HPA_NAME> -n <NAMESPACE>Check that the target Deployment or StatefulSet is correct, that the metrics source is reachable, and that the Events section does not contain errors such as “cannot fetch metrics” or “calculation failed”.
2. Ensure the Metrics API is healthy
kubectl get apiservices | grep metrics kubectl get pods -n kube-system | grep metrics-server kubectl logs -f -n kube-system <METRICS_SERVER_POD>For custom metrics you can query the API directly:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .Typical failures include certificate errors, kubelet access problems, or mis‑configured adapters.
3. Pods must define resources.requests
The HPA algorithm uses the request values to calculate percentages. Without them the controller cannot determine a target utilization.
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi4. Adjust the stabilization window (scale‑down delay)
By default HPA waits 5 minutes before scaling down to avoid thrashing. The window can be shortened:
behavior:
scaleDown:
stabilizationWindowSeconds: 605. Check replica boundaries
If the current replica count has reached maxReplicas, no further scale‑up will occur.
If it has reached minReplicas, scaling down stops.
6. Confirm the Deployment can actually create pods
Even when HPA issues a scale‑up, the Deployment may fail due to image‑pull errors, insufficient node resources, or other pod‑creation problems.
kubectl describe deployment <DEPLOYMENT_NAME>Advanced Considerations
API version : autoscaling/v1 supports only CPU. autoscaling/v2 (GA from Kubernetes 1.23) supports CPU, memory, and custom metrics with multiple scaling policies. Prefer v2.
Metrics‑server sampling : It collects data every 1 minute, while HPA polls every 15 seconds. Metric values may be up to a minute stale.
Load‑testing tools : Simple busy‑loop scripts often do not generate measurable CPU load. Use a proper stress container, e.g.
kubectl run stress --image=alpine/stress -- stress --cpu 2 --timeout 300s, or HTTP load generators such as hey or ab to generate realistic traffic.
Cooling mechanism : Scale‑up is immediate; scale‑down respects the stabilization window (default 5 minutes).
Node resource shortage : HPA can increase pod count, but if the cluster lacks free nodes the pods remain Pending. Combine HPA with the Cluster Autoscaler to add nodes automatically.
Application suitability : HPA works best for stateless services (web, API). Stateful workloads (databases, caches) may need additional architectural support.
Metric selection : Choose metrics that reflect actual load—CPU/Memory for compute‑intensive workloads, QPS/latency for web services, queue length for background job processors. Incorrect metrics lead to ineffective scaling.
Production Checklist
Run kubectl get hpa and kubectl describe hpa; inspect the Events for errors.
Verify that metrics-server (or a custom adapter) is healthy and its pods are running.
Ensure every pod template defines resources.requests for CPU (and memory if used).
Confirm the current replica count is not already at maxReplicas or minReplicas.
Generate real load with a load‑testing tool to validate that metrics increase.
Allow the stabilization window before expecting a scale‑down.
Check that the Deployment can start pods without image‑pull or resource errors.
Make sure the cluster has enough nodes; enable Cluster Autoscaler if necessary.
Use autoscaling/v2 to avoid the limitations of v1.
Select appropriate scaling metrics (CPU, memory, QPS, queue length, etc.).
Summary
HPA is a metric‑driven horizontal scaling controller that depends on metrics-server or custom adapters. Common reasons for non‑functioning HPA include missing or mis‑configured metrics‑server, pods without resources.requests, low metric values, replica limits, the default stabilization window, or Deployment failures. Advanced issues involve API version choice, metric latency, node scarcity (requiring Cluster Autoscaler), and selecting suitable metrics for the workload type.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
