Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples
This guide explains what Kubernetes Horizontal Pod Autoscaler (HPA) is, how it works, its key features, and provides step‑by‑step configuration, verification, and scaling policy details with practical code examples for cloud‑native applications.
What is Kubernetes HPA?
Kubernetes Horizontal Pod Autoscaler (HPA) is an automatic scaling mechanism that adjusts the number of Pods based on actual workload, ensuring stable performance while optimizing resource usage and avoiding waste.
How HPA Works
Monitoring: HPA periodically retrieves resource usage metrics from sources such as Metrics Server or custom providers.
Decision: Based on collected data and predefined targets (e.g., CPU utilization, memory), HPA decides whether to scale the Pods.
Execution: If usage exceeds the threshold, HPA updates the relevant controller (Deployment or ReplicaSet) to increase Pods; if usage is below a threshold, it reduces the number of Pods.
Key Features
Resource‑based scaling: By default, HPA scales on CPU utilization, but it can be configured for memory, network bandwidth, etc.
Custom metrics: Supports external metrics from systems like Prometheus.
Cooldown period: Prevents rapid flapping by defining a cooldown interval during which no scaling occurs.
Flexible configuration: Minimum and maximum replica counts and target metrics can be set via YAML or other methods.
Practical HPA Configuration
Example deployment triggers scaling when total CPU request reaches 500m.
<code>$ cat <<'EOF' | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: simple
namespace: default
spec:
maxReplicas: 10
minReplicas: 1
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageValue: 500m # average CPU usage across all replicas
scaleTargetRef:
apiVersion: apps/v1
name: simple
kind: Deployment
EOF</code>Check the HPA resource:
<code>$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
simple Deployment/simple 0/500m 1 10 1 8m24s</code>Tip: If the Deployment sets replicas to 2 and HPA minimum is 1, HPA will adjust the Deployment to 1 replica.
Verification of HPA Functionality
Create a debugging container with a load‑testing tool:
<code>$ cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: tools
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tools
template:
metadata:
labels:
app: tools
spec:
containers:
- name: tools
image: core.jiaxzeng.com/library/tools:v1.2
EOF</code>Inspect current CPU usage:
<code>$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
simple Deployment/simple 0/500m 1 10 1 73m</code>Run a load test against the HPA‑managed service:
<code>$ kubectl exec -it deploy/tools -- wrk -c 2 -t 1 -d 90s http://simple.default.svc/who/hostname
Running 2m test @ http://simple.default.svc/who/hostname
1 threads and 2 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 219.94us 424.96us 19.51ms 98.82%
Req/Sec 10.18k 1.18k 12.72k 67.78%
911430 requests in 1.50m, 122.56MB read
Requests/sec: 10126.58
Transfer/sec: 1.36MB</code>Watch scaling changes in real time:
<code>$ kubectl get hpa -w
... (output showing replica count adjustments as CPU usage rises and falls) ...</code>Default Scaling Policies
Scale‑down stability window: 300 seconds; only one strategy allows 100% of current replicas to be removed, enabling the target to shrink to the minimum.
Scale‑up: No stability window; when metrics indicate scaling up, pods are added immediately. Two strategies exist: add up to 4 Pods or 100% of current replicas every 15 seconds until the HPA stabilizes.
Algorithm Details
The controller computes a scaling ratio from current and desired metrics. For example, if the current metric is 200m and the target is 100m, the replica count doubles (200/100 = 2). If the current metric is 50m, the count halves (50/100 = 0.5). When the ratio is close to 1.0 within a configurable tolerance (default 0.1), no scaling occurs.
Reference Documentation
Official documentation: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Conclusion
Kubernetes HPA is a key tool for achieving elastic scaling in cloud‑native environments. Understanding its concepts and practical configuration enables you to build more efficient and reliable applications. Experiment with your own workloads to explore advanced features.
Linux Ops Smart Journey
The operations journey never stops—pursuing excellence endlessly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.