How to Handle Traffic Spikes and Optimize Resources with Kubernetes HPA + VPA
This guide walks through the problem of fluctuating traffic in Kubernetes, explains the differences between Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), and provides step‑by‑step commands, YAML examples, best‑practice recommendations, troubleshooting tips, and monitoring alerts for deploying a production‑grade HPA + VPA solution.
Overview
Online services often experience pronounced traffic peaks during the day and long idle periods at night. Manual scaling of Pods at fixed times is impractical, so Kubernetes provides two automatic scaling mechanisms:
Horizontal Pod Autoscaler (HPA) : adjusts the number of Pod replicas based on metrics such as CPU, memory, custom, or external metrics.
Vertical Pod Autoscaler (VPA) : adjusts the requests and limits of each Pod based on historical usage.
HPA solves "how many Pods are needed" while VPA solves "how much resource each Pod needs". In production HPA is used far more often; VPA is typically a supplemental tool for resource planning.
Technical Characteristics
HPA uses the Metrics Server or a custom Prometheus Adapter. The scaling formula is
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue)).
VPA consists of three components – Recommender, Updater, and Admission Controller – and supports four update modes (Auto, Recreate, Initial, Off).
Stabilization windows prevent thrashing: HPA v2 defaults to scaleDown.stabilizationWindowSeconds = 300 (5 min) and scaleUp.stabilizationWindowSeconds = 0.
Prerequisites
Kubernetes 1.23+ (HPA v2 is GA). Older clusters need autoscaling/v2beta2 or autoscaling/v2beta1.
Metrics Server 0.6.0+ – provides CPU/Memory metrics; mandatory for HPA.
VPA 0.14.0+ – installed separately; not part of core Kubernetes.
Prometheus + Adapter – Prometheus 2.40+, Adapter 0.11+ – required for custom metrics (e.g., QPS, queue depth).
At least three worker nodes to ensure scheduling capacity for scaled Pods.
Step‑by‑Step Walkthrough
1. Prepare the Environment
# Verify Kubernetes version (HPA v2 needs 1.23+)
kubectl version --short
# Check Metrics Server deployment
kubectl get deployment metrics-server -n kube-system
# Install Metrics Server if missing
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.1/components.yaml
# For clusters with self‑signed certificates, add insecure TLS flag
kubectl edit deployment metrics-server -n kube-system
# Add args (example):
# - --kubelet-insecure-tls
# - --metric-resolution=15s2. Deploy a Test Application
# test-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
namespace: default
spec:
selector:
app: php-apache
ports:
- port: 80
targetPort: 80 # Apply and wait for the Deployment to become ready
kubectl apply -f test-app.yaml
kubectl wait --for=condition=available deployment/php-apache --timeout=60s3. Create a CPU‑Based HPA
# hpa-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
selectPolicy: Min # Apply HPA
kubectl apply -f hpa-cpu.yaml
# Verify
kubectl get hpa php-apache-hpaKey parameters: minReplicas should be ≥ 2 in production to avoid a single‑point‑failure. scaleUp.stabilizationWindowSeconds = 0 enables immediate scaling on traffic spikes. scaleDown.stabilizationWindowSeconds = 300 (default) prevents premature shrinkage.
4. Add Custom Metrics (CPU + Memory + QPS)
# hpa-multi-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
- type: External
external:
metric:
name: rabbitmq_queue_messages
selector:
matchLabels:
queue: "task-queue"
target:
type: AverageValue
averageValue: "50"Custom metrics require the Prometheus Adapter. Install it via Helm:
# Install Prometheus Adapter via Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc \
--set prometheus.port=9090 # Example adapter ConfigMap (values are placed in a ConfigMap named prometheus-adapter)
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
- seriesQuery: 'rabbitmq_queue_messages{queue!=""}'
resources:
template: "<<.Resource>>"
name:
matches: "^(.*)"
as: "$1"
metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'5. Install and Configure VPA
# Clone the autoscaler repository (contains VPA)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Deploy VPA components (Recommender, Updater, Admission Controller)
./hack/vpa-up.sh
# Verify three Pods are running
kubectl get pods -n kube-system | grep vpaExample VPA in Off mode (only recommends):
# vpa-off.yaml (Off mode)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: php-apache-vpa
namespace: default
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: php-apache
minAllowed:
cpu: 100m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsOnly # Apply VPA
kubectl apply -f vpa-off.yaml
# After a few hours, view recommendations
kubectl get vpa php-apache-vpa -o jsonpath='{.status.recommendation}' | jq .Typical recommendation output:
{
"containerRecommendations": [{
"containerName": "php-apache",
"target": {"cpu": "250m", "memory": "180Mi"},
"lowerBound": {"cpu": "100m", "memory": "64Mi"},
"upperBound": {"cpu": "500m", "memory": "300Mi"}
}]
}6. Mixed HPA + VPA Usage
Do NOT run HPA and VPA in Auto mode on the same CPU/Memory metrics because they will conflict. Recommended pattern:
Set VPA updateMode: Off to obtain recommendations.
Let HPA perform the actual scaling based on CPU, memory, or custom metrics.
# vpa-off-mode.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Off"
---
# HPA using only custom metric (QPS)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 30
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "800"
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 5
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Percent
value: 5
periodSeconds: 60
selectPolicy: MinBest Practices & Pitfalls
Performance Optimisation
Reduce Metrics Server collection interval for ultra‑fast workloads (e.g., --metric-resolution=10s) but never below 10 s to avoid kubelet overload.
Pre‑pull images with a DaemonSet to eliminate pull latency during scale‑out.
Use small base images (Alpine) and keep readinessProbe initial delay low (Go 3‑5 s, Java 10‑15 s).
Enable PodTopologySpread to avoid concentrating new Pods on a single node.
Safety & Reliability
Set maxReplicas based on actual cluster capacity (e.g., a 3‑node cluster with 32 CPU each → max ≈ 80 Pods for 1 CPU/2 Gi per Pod).
Restrict HPA modifications via RBAC; only the SRE team should have patch / update rights.
Deploy a PodDisruptionBudget (e.g., minAvailable: 60%) so HPA‑driven deletions never break service availability.
Use conservative scale‑down windows (10 min) and limit shrink rate to ≤ 10 % per minute.
Common Errors (converted from original table)
<unknown>/50% in HPA targets – cause: Metrics Server not installed or Pods lack resources.requests. Fix: install Metrics Server and add resources.requests to the Deployment.
HPA shows normal targets but does not scale – cause: current metric is within tolerance (default ±10 %). Fix: verify metric actually exceeds desiredMetricValue * 1.1.
New Pods stay Pending – cause: insufficient cluster resources. Fix: scale the cluster or enable Cluster Autoscaler.
VPA recommendations are empty – cause: Recommender not running or insufficient data collection time. Fix: check VPA pod logs; wait at least 8 h for data.
VPA Auto mode causes frequent restarts – cause: each resource change forces Pod recreation. Fix: switch to Off mode and apply recommendations manually, or use a PDB to limit concurrent restarts.
Monitoring & Alerting
Key Metrics
HPA current replicas vs. maxReplicas (alert if ≥ 90 %).
HPA target utilisation (alert if > 150 % for 5 min).
Pod CPU utilisation (warning > 85 % for 3 min).
Pod memory utilisation (warning > 90 %).
Metrics Server latency (<1 s normal, >5 s critical).
VPA recommendation drift (<20 % normal, >50 % warning).
Prometheus Alert Rules (excerpt)
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: hpa-alerts
namespace: monitoring
spec:
groups:
- name: hpa.rules
rules:
- alert: HPAMaxedOut
expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
for: 10m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} reached max replicas"
description: "Consider increasing maxReplicas or optimising the workload."
- alert: HPAScalingInactive
expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} cannot fetch metrics"
description: "Check Metrics Server or Prometheus Adapter."
- alert: HighCPUUtilization
expr: avg(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (namespace, pod) /
avg(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} CPU > 90%"
description: "Investigate high CPU usage or adjust HPA target."Backup & Restore
Backup Script
#!/bin/bash
# backup-hpa-vpa.sh – backs up all HPA, VPA and Prometheus‑Adapter configs
BACKUP_DIR="/opt/k8s/backup/autoscaling/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup HPA
for ns in $(kubectl get hpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"
"}{end}' | sort -u); do
mkdir -p "$BACKUP_DIR/hpa/$ns"
kubectl get hpa -n "$ns" -o yaml > "$BACKUP_DIR/hpa/$ns/all-hpa.yaml"
done
# Backup VPA
for ns in $(kubectl get vpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"
"}{end}' | sort -u); do
mkdir -p "$BACKUP_DIR/vpa/$ns"
kubectl get vpa -n "$ns" -o yaml > "$BACKUP_DIR/vpa/$ns/all-vpa.yaml"
done
# Backup Prometheus Adapter ConfigMap (if present)
kubectl get configmap prometheus-adapter -n monitoring -o yaml > "$BACKUP_DIR/prometheus-adapter-config.yaml" 2>/dev/null
echo "Backup completed: $BACKUP_DIR"
ls -lR "$BACKUP_DIR"Restore Procedure
Check current state: kubectl get hpa,vpa -A Restore HPA: kubectl apply -f /opt/k8s/backup/autoscaling/20260208/hpa/ Restore VPA: kubectl apply -f /opt/k8s/backup/autoscaling/20260208/vpa/ Verify:
kubectl get hpa,vpa -AConclusion
HPA scaling formula:
desiredReplicas = ceil(currentReplicas * currentMetricValue / desiredMetricValue).
Use VPA in Off mode to collect recommendations, then manually adjust requests / limits for stable resource usage.
Configure behavior policies – Max for rapid scale‑up, Min for smooth scale‑down, and appropriate stabilization windows.
Never run HPA and VPA on the same CPU/Memory metrics in Auto mode; let HPA handle replica count and VPA act as a resource advisor.
Further Reading
KEDA – event‑driven autoscaling for queues, cron, etc.
Cluster Autoscaler – automatically adds/removes nodes when HPA creates pending Pods.
Multidimensional Pod Autoscaler (MPA) – research project that combines horizontal and vertical scaling.
References
Kubernetes HPA official documentation.
VPA GitHub repository.
Prometheus Adapter documentation.
KEDA documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
