Cloud Native 41 min read

How to Handle Traffic Spikes and Optimize Resources with Kubernetes HPA + VPA

This guide walks through the problem of fluctuating traffic in Kubernetes, explains the differences between Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), and provides step‑by‑step commands, YAML examples, best‑practice recommendations, troubleshooting tips, and monitoring alerts for deploying a production‑grade HPA + VPA solution.

Raymond Ops
Raymond Ops
Raymond Ops
How to Handle Traffic Spikes and Optimize Resources with Kubernetes HPA + VPA

Overview

Online services often experience pronounced traffic peaks during the day and long idle periods at night. Manual scaling of Pods at fixed times is impractical, so Kubernetes provides two automatic scaling mechanisms:

Horizontal Pod Autoscaler (HPA) : adjusts the number of Pod replicas based on metrics such as CPU, memory, custom, or external metrics.

Vertical Pod Autoscaler (VPA) : adjusts the requests and limits of each Pod based on historical usage.

HPA solves "how many Pods are needed" while VPA solves "how much resource each Pod needs". In production HPA is used far more often; VPA is typically a supplemental tool for resource planning.

Technical Characteristics

HPA uses the Metrics Server or a custom Prometheus Adapter. The scaling formula is

desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))

.

VPA consists of three components – Recommender, Updater, and Admission Controller – and supports four update modes (Auto, Recreate, Initial, Off).

Stabilization windows prevent thrashing: HPA v2 defaults to scaleDown.stabilizationWindowSeconds = 300 (5 min) and scaleUp.stabilizationWindowSeconds = 0.

Prerequisites

Kubernetes 1.23+ (HPA v2 is GA). Older clusters need autoscaling/v2beta2 or autoscaling/v2beta1.

Metrics Server 0.6.0+ – provides CPU/Memory metrics; mandatory for HPA.

VPA 0.14.0+ – installed separately; not part of core Kubernetes.

Prometheus + Adapter – Prometheus 2.40+, Adapter 0.11+ – required for custom metrics (e.g., QPS, queue depth).

At least three worker nodes to ensure scheduling capacity for scaled Pods.

Step‑by‑Step Walkthrough

1. Prepare the Environment

# Verify Kubernetes version (HPA v2 needs 1.23+)
kubectl version --short

# Check Metrics Server deployment
kubectl get deployment metrics-server -n kube-system

# Install Metrics Server if missing
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.1/components.yaml
# For clusters with self‑signed certificates, add insecure TLS flag
kubectl edit deployment metrics-server -n kube-system
# Add args (example):
#   - --kubelet-insecure-tls
#   - --metric-resolution=15s

2. Deploy a Test Application

# test-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: php-apache
  template:
    metadata:
      labels:
        app: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  namespace: default
spec:
  selector:
    app: php-apache
  ports:
  - port: 80
    targetPort: 80
# Apply and wait for the Deployment to become ready
kubectl apply -f test-app.yaml
kubectl wait --for=condition=available deployment/php-apache --timeout=60s

3. Create a CPU‑Based HPA

# hpa-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
        selectPolicy: Min
# Apply HPA
kubectl apply -f hpa-cpu.yaml
# Verify
kubectl get hpa php-apache-hpa

Key parameters: minReplicas should be ≥ 2 in production to avoid a single‑point‑failure. scaleUp.stabilizationWindowSeconds = 0 enables immediate scaling on traffic spikes. scaleDown.stabilizationWindowSeconds = 300 (default) prevents premature shrinkage.

4. Add Custom Metrics (CPU + Memory + QPS)

# hpa-multi-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  - type: External
    external:
      metric:
        name: rabbitmq_queue_messages
      selector:
        matchLabels:
          queue: "task-queue"
      target:
        type: AverageValue
        averageValue: "50"

Custom metrics require the Prometheus Adapter. Install it via Helm:

# Install Prometheus Adapter via Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc \
  --set prometheus.port=9090
# Example adapter ConfigMap (values are placed in a ConfigMap named prometheus-adapter)
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
    - seriesQuery: 'rabbitmq_queue_messages{queue!=""}'
      resources:
        template: "<<.Resource>>"
      name:
        matches: "^(.*)"
        as: "$1"
      metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

5. Install and Configure VPA

# Clone the autoscaler repository (contains VPA)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Deploy VPA components (Recommender, Updater, Admission Controller)
./hack/vpa-up.sh
# Verify three Pods are running
kubectl get pods -n kube-system | grep vpa

Example VPA in Off mode (only recommends):

# vpa-off.yaml (Off mode)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: php-apache-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: php-apache
      minAllowed:
        cpu: 100m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsOnly
# Apply VPA
kubectl apply -f vpa-off.yaml
# After a few hours, view recommendations
kubectl get vpa php-apache-vpa -o jsonpath='{.status.recommendation}' | jq .

Typical recommendation output:

{
  "containerRecommendations": [{
    "containerName": "php-apache",
    "target": {"cpu": "250m", "memory": "180Mi"},
    "lowerBound": {"cpu": "100m", "memory": "64Mi"},
    "upperBound": {"cpu": "500m", "memory": "300Mi"}
  }]
}

6. Mixed HPA + VPA Usage

Do NOT run HPA and VPA in Auto mode on the same CPU/Memory metrics because they will conflict. Recommended pattern:

Set VPA updateMode: Off to obtain recommendations.

Let HPA perform the actual scaling based on CPU, memory, or custom metrics.

# vpa-off-mode.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"
---
# HPA using only custom metric (QPS)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "800"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 5
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Percent
        value: 5
        periodSeconds: 60
        selectPolicy: Min

Best Practices & Pitfalls

Performance Optimisation

Reduce Metrics Server collection interval for ultra‑fast workloads (e.g., --metric-resolution=10s) but never below 10 s to avoid kubelet overload.

Pre‑pull images with a DaemonSet to eliminate pull latency during scale‑out.

Use small base images (Alpine) and keep readinessProbe initial delay low (Go 3‑5 s, Java 10‑15 s).

Enable PodTopologySpread to avoid concentrating new Pods on a single node.

Safety & Reliability

Set maxReplicas based on actual cluster capacity (e.g., a 3‑node cluster with 32 CPU each → max ≈ 80 Pods for 1 CPU/2 Gi per Pod).

Restrict HPA modifications via RBAC; only the SRE team should have patch / update rights.

Deploy a PodDisruptionBudget (e.g., minAvailable: 60%) so HPA‑driven deletions never break service availability.

Use conservative scale‑down windows (10 min) and limit shrink rate to ≤ 10 % per minute.

Common Errors (converted from original table)

<unknown>/50% in HPA targets – cause: Metrics Server not installed or Pods lack resources.requests. Fix: install Metrics Server and add resources.requests to the Deployment.

HPA shows normal targets but does not scale – cause: current metric is within tolerance (default ±10 %). Fix: verify metric actually exceeds desiredMetricValue * 1.1.

New Pods stay Pending – cause: insufficient cluster resources. Fix: scale the cluster or enable Cluster Autoscaler.

VPA recommendations are empty – cause: Recommender not running or insufficient data collection time. Fix: check VPA pod logs; wait at least 8 h for data.

VPA Auto mode causes frequent restarts – cause: each resource change forces Pod recreation. Fix: switch to Off mode and apply recommendations manually, or use a PDB to limit concurrent restarts.

Monitoring & Alerting

Key Metrics

HPA current replicas vs. maxReplicas (alert if ≥ 90 %).

HPA target utilisation (alert if > 150 % for 5 min).

Pod CPU utilisation (warning > 85 % for 3 min).

Pod memory utilisation (warning > 90 %).

Metrics Server latency (<1 s normal, >5 s critical).

VPA recommendation drift (<20 % normal, >50 % warning).

Prometheus Alert Rules (excerpt)

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hpa-alerts
  namespace: monitoring
spec:
  groups:
  - name: hpa.rules
    rules:
    - alert: HPAMaxedOut
      expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} reached max replicas"
        description: "Consider increasing maxReplicas or optimising the workload."
    - alert: HPAScalingInactive
      expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} cannot fetch metrics"
        description: "Check Metrics Server or Prometheus Adapter."
    - alert: HighCPUUtilization
      expr: avg(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (namespace, pod) /
            avg(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} CPU > 90%"
        description: "Investigate high CPU usage or adjust HPA target."

Backup & Restore

Backup Script

#!/bin/bash
# backup-hpa-vpa.sh – backs up all HPA, VPA and Prometheus‑Adapter configs
BACKUP_DIR="/opt/k8s/backup/autoscaling/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup HPA
for ns in $(kubectl get hpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"
"}{end}' | sort -u); do
  mkdir -p "$BACKUP_DIR/hpa/$ns"
  kubectl get hpa -n "$ns" -o yaml > "$BACKUP_DIR/hpa/$ns/all-hpa.yaml"
done

# Backup VPA
for ns in $(kubectl get vpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"
"}{end}' | sort -u); do
  mkdir -p "$BACKUP_DIR/vpa/$ns"
  kubectl get vpa -n "$ns" -o yaml > "$BACKUP_DIR/vpa/$ns/all-vpa.yaml"
done

# Backup Prometheus Adapter ConfigMap (if present)
kubectl get configmap prometheus-adapter -n monitoring -o yaml > "$BACKUP_DIR/prometheus-adapter-config.yaml" 2>/dev/null

echo "Backup completed: $BACKUP_DIR"
ls -lR "$BACKUP_DIR"

Restore Procedure

Check current state: kubectl get hpa,vpa -A Restore HPA: kubectl apply -f /opt/k8s/backup/autoscaling/20260208/hpa/ Restore VPA: kubectl apply -f /opt/k8s/backup/autoscaling/20260208/vpa/ Verify:

kubectl get hpa,vpa -A

Conclusion

HPA scaling formula:

desiredReplicas = ceil(currentReplicas * currentMetricValue / desiredMetricValue)

.

Use VPA in Off mode to collect recommendations, then manually adjust requests / limits for stable resource usage.

Configure behavior policies – Max for rapid scale‑up, Min for smooth scale‑down, and appropriate stabilization windows.

Never run HPA and VPA on the same CPU/Memory metrics in Auto mode; let HPA handle replica count and VPA act as a resource advisor.

Further Reading

KEDA – event‑driven autoscaling for queues, cron, etc.

Cluster Autoscaler – automatically adds/removes nodes when HPA creates pending Pods.

Multidimensional Pod Autoscaler (MPA) – research project that combines horizontal and vertical scaling.

References

Kubernetes HPA official documentation.

VPA GitHub repository.

Prometheus Adapter documentation.

KEDA documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesAutoscalingPrometheusHPAVPAMetrics Server
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.