Master Kubernetes Node Autoscaling with Custom Prometheus Metrics in 30 Minutes
This guide walks you through a complete, 30‑minute implementation of Kubernetes node autoscaling using Horizontal Pod Autoscaler (HPA) with custom Prometheus metrics, covering prerequisites, anti‑pattern warnings, environment matrix, step‑by‑step deployment, core principles, observability, troubleshooting, best practices, and FAQ.
K8s Node Autoscaling: Complete HPA Practice Based on Prometheus Custom Metrics – 30‑Minute Business‑Level Elastic Scaling
1. K8s Node Autoscaling: Complete HPA Practice Based on Prometheus Custom Metrics
2. Applicable Scenarios & Prerequisites
Item
Requirement
Applicable Scenario
Web applications, API services, or scheduled jobs with obvious traffic fluctuations
Kubernetes
1.23+ (supports HPA v2 API)
Metrics Server
0.6.0+ (provides resource metrics)
Prometheus
2.40+ (collects custom metrics)
Prometheus Adapter
0.11+ (converts Prometheus metrics to K8s Custom Metrics)
Cluster Size
3+ nodes, each 4C8G (minimum) / 8C16G (recommended)
Network Requirements
Internal DNS works, Prometheus can reach all Pods
Permissions
cluster‑admin or rights to create HPA and ServiceMonitor
Skill Requirements
Familiar with Kubernetes, Prometheus PromQL, YAML configuration
3. Anti‑Pattern Warnings
Do NOT use this solution in the following scenarios:
Ultra‑low latency requirements : financial trading, real‑time bidding (HPA scaling takes 30‑60 s, cannot meet millisecond response)
Stateful services : database clusters, message queues (scaling requires manual data migration, HPA only fits stateless apps)
Very stable traffic : daily variation < 10 % (fixed replica count is simpler)
Resource‑constrained clusters : node utilization > 80 % (no room to scale out)
Cost‑sensitive scenarios : pay‑as‑you‑go cloud, frequent scaling may increase cost
Alternative solutions comparison:
Scenario
Recommended Solution
Reason
Low latency requirement
Reserve fixed replicas + over‑provisioning
Avoid scaling wait time
Stateful service
StatefulSet + manual scaling
Guarantee data consistency
Very stable traffic
Deployment with fixed replicas
Simplify operations
Cost‑first
Cluster Autoscaler + node pools
Scale nodes down to reduce cost
Batch jobs
CronJob + Job
No need for always‑running Pods
4. Environment & Version Matrix
Component
Version
Installation Method
Tested
Kubernetes
1.28.4 / 1.27.8 / 1.26.11
kubeadm / k3s / EKS
Yes
Metrics Server
0.6.4 / 0.6.3
Helm / kubectl apply
Yes
Prometheus
2.48.0 / 2.45.0
Prometheus Operator / Helm
Yes
Prometheus Adapter
0.11.1 / 0.10.0
Helm
Yes
Helm
3.12+
Official binary
Yes
kubectl
1.28+
Official binary
Yes
Version differences:
Kubernetes < 1.23 uses HPA v2beta2 API (not covered here)
Metrics Server 0.5 vs 0.6: 0.6 improves HA, recommended for production
Prometheus Adapter 0.10 vs 0.11: 0.11 supports more flexible mapping rules
5. Reading Navigation
Quick start (20 min): Chapter 6 → Chapter 7 → Chapter 14
Deep dive (60 min): Chapter 8 → Chapter 7 → Chapter 10 → Chapter 12
Troubleshooting: Chapter 9 → Chapter 8
6. Quick Checklist
Preparation
Check Kubernetes version: kubectl version Deploy Metrics Server: kubectl apply -f metrics-server.yaml Deploy Prometheus Operator: helm install prometheus Deploy Prometheus Adapter: helm install prometheus-adapter Implementation
Deploy example Nginx app and ServiceMonitor
Configure custom‑metric mapping in Prometheus Adapter
Create CPU‑based HPA (basic verification)
Create custom‑metric (QPS) HPA
Configure scaling policies (cool‑down, behavior)
Verification
Test CPU‑based scaling
Load‑test to trigger QPS‑based scaling
Validate metrics availability
Monitoring
Configure Grafana dashboards
Set Prometheus alert rules
7. Implementation Steps
System Architecture
【Kubernetes HPA Autoscaling Architecture】
Application Layer
├─ Nginx Deployment (initial 2 replicas)
│ ↓
├─ Service (exposes port 80)
│ ↓
└─ Pod (exposes Prometheus metrics: /metrics)
Monitoring Layer
├─ Prometheus
│ ├─ Discovers Pods via ServiceMonitor
│ ├─ Collects http_requests_total, container_cpu_usage
│ └─ Stores time‑series data
│
└─ Prometheus Adapter
├─ Queries Prometheus metrics
├─ Converts to K8s Custom Metrics API
└─ Exposes /apis/custom.metrics.k8s.io/v1beta1
Control Layer
├─ Metrics Server
│ └─ Provides CPU/Memory usage
│
└─ HPA Controller
├─ Queries metrics every 15 s (default)
│ ├─ Resource metrics from Metrics Server
│ └─ Custom metrics from Prometheus Adapter
├─ Calculates desired replica count
│ Formula: desired = ceil(current * (currentMetric / targetMetric))
├─ Applies scaling policies (cool‑down, max‑scale)
└─ Updates Deployment replicas
Data Flow
┌─────────────────────────────────────────────┐
│ 1. Pod exposes metrics → Prometheus collects │
│ 2. Adapter converts to K8s API │
│ 3. HPA queries every 15 s │
│ 4. HPA computes desired replicas │
│ 5. Deployment creates/deletes Pods │
└─────────────────────────────────────────────┘Step 1: Deploy Metrics Server
Goal: Provide basic resource metrics (CPU/Memory)
Check if already installed:
kubectl get deployment metrics-server -n kube-system
# Expected: "Error from server (NotFound)" if not installedInstall (official YAML):
# Download official manifest
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml
# If the cluster uses self‑signed certs, add insecure flag
kubectl patch deployment metrics-server -n kube-system \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'Key parameters: --kubelet-insecure-tls: skip kubelet certificate verification (test only) --metric-resolution=15s: metric collection interval (default 60 s, can be 15 s) --kubelet-preferred-address-types=InternalIP: prefer internal IP for kubelet
Validate after installation:
# Wait for Pods to be ready
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=60s
# Verify metrics are available
kubectl top nodes
kubectl top pods -ACommon errors:
# Error 1: cannot connect to kubelet
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
# Fix: ensure port 10250 is open, add --kubelet‑insecure‑tls
# Error 2: certificate validation failed
x509: cannot validate certificate for 192.168.1.10 because it doesn't contain any IP SANs
# Fix: add --kubelet‑insecure‑tls (test) or configure proper certificatesStep 2: Deploy Prometheus Operator
Goal: Install Prometheus for custom‑metric collection
# Add Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Create namespace
kubectl create namespace monitoring
# Install full stack (Prometheus, Grafana, Alertmanager)
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50GiValidate:
# Check Prometheus pod
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus
# Check Service
kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
# Port‑forward to UI
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090Validate metric collection:
# Query node count
up{job="kubernetes-nodes"}
# Query deployment replica count
kube_deployment_status_replicasStep 3: Deploy Prometheus Adapter
Goal: Convert Prometheus metrics to the Kubernetes Custom Metrics API
# Install via Helm
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-kube-prometheus-prometheus.monitoring.svc \
--set prometheus.port=9090Custom metric mapping (key steps):
# Create mapping file (prometheus-adapter-values.yaml)
cat > prometheus-adapter-values.yaml <<EOF
prometheus:
url: http://prometheus-kube-prometheus-prometheus.monitoring.svc
port: 9090
rules:
default: true
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
- seriesQuery: 'nginx_connections_active{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)$"
as: "${1}"
metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
- seriesQuery: 'myapp_processing_duration_seconds_sum{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_sum$"
as: "${1}_avg"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>) / sum(rate(myapp_processing_duration_seconds_count{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
EOF
# Apply mapping
helm upgrade prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring -f prometheus-adapter-values.yamlKey parameters explained: seriesQuery: which Prometheus metrics to select resources.overrides: map Prometheus labels to K8s resources (namespace/pod) name.matches: regex to rename metric metricsQuery: actual PromQL (usually rate())
Validate:
# Check Adapter pod
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter
# Verify Custom Metrics API
kubectl get apiservices | grep custom.metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'Common errors:
# API Service unavailable
Error from server (ServiceUnavailable): the server is currently unable to handle the request
# Fix: check Adapter pod logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter
# Metric not listed
# Fix: ensure metric exists in Prometheus UI, adjust seriesQuery regexStep 4: Deploy Example Application & Expose Metrics
Goal: Deploy Nginx with Prometheus exporter
# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-app
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
prometheus.io/path: "/metrics"
spec:
containers:
- name: nginx
image: nginx:1.24-alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
- name: nginx-exporter
image: nginx/nginx-prometheus-exporter:0.11
args:
- '-nginx.scrape-uri=http://localhost:80/stub_status'
ports:
- containerPort: 9113
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: nginx-app
namespace: default
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
name: http
- port: 9113
targetPort: 9113
name: metrics
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
namespace: default
data:
default.conf: |
server {
listen 80;
location / {
root /usr/share/nginx/html;
index index.html;
}
location /stub_status {
stub_status on;
access_log off;
}
}
EOF
kubectl apply -f nginx-deployment.yamlCreate ServiceMonitor for automatic discovery:
# nginx-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-app
namespace: default
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
endpoints:
- port: metrics
interval: 15s
path: /metrics
EOF
kubectl apply -f nginx-servicemonitor.yamlValidate:
# Pods running
kubectl get pods -l app=nginx
# Access metrics endpoint
kubectl port-forward svc/nginx-app 9113:9113
curl http://localhost:9113/metrics
# Verify in Prometheus UI
# Query: nginx_connections_active{namespace="default"}Step 5: Create CPU‑Based HPA (basic verification)
Goal: Verify basic HPA functionality
# hpa-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa-cpu
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
EOF
kubectl apply -f hpa-cpu.yamlKey parameters: averageUtilization: 50 – target CPU usage (relative to requests) stabilizationWindowSeconds – decision window to avoid flapping scaleUp.policies – max scaling per interval (percent or pod count)
Validate:
# Check HPA status
kubectl get hpa nginx-hpa-cpu
# Detailed view
kubectl describe hpa nginx-hpa-cpuStep 6: Create Custom‑Metric (QPS) HPA
Goal: Autoscale based on business metric (requests per second)
# hpa-qps.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa-qps
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 5
periodSeconds: 30
EOF
kubectl apply -f hpa-qps.yamlPre‑validation (ensure custom metric exists):
# Query metric value
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .
# If empty, generate traffic:
kubectl run -it --rm test --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx-app.default.svc.cluster.local; done"Validate after applying HPA:
# HPA status
kubectl get hpa nginx-hpa-qps
# Load test with wrk
kubectl run -it --rm wrk --image=skandyla/wrk --restart=Never -- -t4 -c100 -d300s http://nginx-app.default.svc.cluster.local
# Watch scaling
kubectl get hpa nginx-hpa-qps --watchCommon issues:
# Unable to get metric
# Fix: verify Prometheus Adapter config and that the metric exists in Prometheus8. Minimal Required Principles
HPA Working Mechanism:
【HPA Decision Flow】
Step 1: Query metrics
├─ Resource metrics → Metrics Server API
└─ Custom metrics → Custom Metrics API (Prometheus Adapter)
Step 2: Compute desired replicas
Formula: desired = ceil(current * (currentMetric / targetMetric))
Step 3: Apply tolerance (default 10 %)
├─ current > target × 1.1 → scale up
├─ current < target × 0.9 → scale down
└─ otherwise → no change
Step 4: Apply behavior policies
• Scale‑up: fast, small stabilization window
• Scale‑down: longer window to avoid thrashing
Step 5: Update Deployment replicas
├─ ReplicaSet creates/deletes Pods
└─ Pods become Ready (typically 30‑60 s)Why stabilizationWindowSeconds? Without it, rapid metric swings cause frequent up‑down cycles (flapping). The window smooths decisions by considering the max/min value over the period.
9. Observability
9.1 Monitoring Metrics
Core HPA metrics:
# List HPA status
kubectl get hpa -A
# Detailed metrics
kubectl get hpa nginx-hpa-qps -o yaml | grep -A 10 currentMetricsPrometheus queries for Grafana:
# Current replica count
kube_horizontalpodautoscaler_status_current_replicas{hpa="nginx-hpa-qps"}
# Desired replica count
kube_horizontalpodautoscaler_status_desired_replicas{hpa="nginx-hpa-qps"}
# Scaling events rate
rate(kube_horizontalpodautoscaler_status_desired_replicas{hpa="nginx-hpa-qps"}[5m])
# Current QPS per pod
sum(rate(http_requests_total{namespace="default",pod=~"nginx-app-.*"}[2m])) by (pod)
# CPU usage per pod (percentage)
sum(rate(container_cpu_usage_seconds_total{namespace="default",pod=~"nginx-app-.*"}[2m])) by (pod) /
on(pod) group_left() kube_pod_container_resource_requests{resource="cpu"} * 1009.2 Alert Rules
# prometheus-hpa-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: hpa-alerts
namespace: monitoring
spec:
groups:
- name: hpa_alerts
interval: 30s
rules:
- alert: HPAMaxedOut
expr: kube_horizontalpodautoscaler_status_current_replicas >= kube_horizontalpodautoscaler_spec_max_replicas
for: 5m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.hpa }} has reached max replicas"
description: "Current replicas {{ $value }} reached configured max, consider increasing maxReplicas"
- alert: HPAFlapping
expr: changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) > 4
for: 5m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.hpa }} is flapping"
description: "Replica count changed {{ $value }} times in last 10 min, consider adjusting stabilizationWindowSeconds"
- alert: HPAMetricsUnavailable
expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
for: 10m
labels:
severity: critical
annotations:
summary: "HPA {{ $labels.hpa }} cannot fetch metrics"
description: "Scaling is inactive, check Metrics Server or Prometheus Adapter"
- alert: PodCPUThrottling
expr: sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (pod, namespace) /
sum(rate(container_cpu_cfs_periods_total[5m])) by (pod, namespace) > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} CPU throttled"
description: "CPU usage exceeds limits ({{ $value | humanizePercentage }}), consider adjusting resources"10. Common Faults & Troubleshooting
Symptom
Diagnostic Command
Possible Root Cause
Quick Fix
Permanent Fix
HPA shows
<unknown> kubectl get hpa1. Metrics Server not deployed
2. Pods lack resources.requests Deploy Metrics Server
Configure resources.requests for all Pods
Custom metric ineffective kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 1. Prometheus Adapter not running
2. Metric mapping mis‑configured
Check Adapter pod logs
Fix rules configuration
HPA does not scale up kubectl describe hpa 1. Current value below target × 1.1
2. Already at maxReplicas Adjust target value or increase maxReplicas Optimize scaling policy
HPA does not scale down
Wait 5 min and observe
1. Within stabilization window
2. Current value above target × 0.9
Manually test scaling
Adjust stabilization window or target
Pods restart frequently kubectl get events 1. Too low resources.limits 2. Aggressive scaling
Increase limits
Refine behavior policies
Scaling latency too long
Time the scaling process
1. Slow image pull
2. Node resource shortage
Pre‑pull images or add nodes
Use Cluster Autoscaler
Systematic diagnosis flow:
【HPA Diagnosis Flow】
1️⃣ Check HPA status (kubectl get hpa -A)
└─ If <unknown> → verify Metrics Server
2️⃣ Verify Metrics Server (kubectl top nodes)
└─ If fails → fix Metrics Server (logs, certs, network)
3️⃣ Inspect HPA details (kubectl describe hpa <name>)
└─ Look for "failed to get metric" → custom metric issue
4️⃣ Custom metric path:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
└─ If metric missing → check Prometheus Adapter config & Prometheus data
5️⃣ Behavior policy check:
Review <code>spec.behavior</code>, tolerance, stabilization windows
Compute expected replica count and compare with actual
6️⃣ Verify cooldown timers for up/down scaling
└─ Ensure windows align with business requirements11. Change & Rollback Playbook
Canary Strategy
Scenario: Updating HPA target values or behavior
# Backup current HPA
kubectl get hpa nginx-hpa-qps -o yaml > hpa-backup-$(date +%Y%m%d-%H%M%S).yaml
# Dry‑run new config in test env
kubectl apply -f hpa-new-config.yaml --dry-run=server
# Apply new config
kubectl apply -f hpa-new-config.yaml
# Watch for 5 min
kubectl get hpa nginx-hpa-qps --watch
# If problems, rollback
kubectl apply -f hpa-backup-*.yamlHealth‑Check Checklist
# 1. All HPA show concrete metrics
kubectl get hpa -A | grep -v "<unknown>"
# 2. Metrics Server works
kubectl top nodes
# 3. Custom Metrics API reachable
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources | length'
# 4. Pods have resource requests
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].resources.requests == null) | .metadata.name'
# 5. No HPA errors in events
kubectl get events -A --field-selector involvedObject.kind=HorizontalPodAutoscaler | grep -i errorRollback Conditions & Commands
Trigger rollback when:
HPA hits maxReplicas but load still exceeds target
More than 5 scaling actions within 10 min (flapping)
Custom metric unavailable for > 10 min
Rollback steps:
# Freeze HPA (set min=max)
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"maxReplicas":2,"minReplicas":2}}'
# Manually set fixed replica count
kubectl scale deployment nginx-app --replicas=5
# Restore original HPA config
kubectl apply -f hpa-backup.yaml
# Verify service health
kubectl get pods -l app=nginx
curl http://nginx-app.default.svc.cluster.local12. Best Practices
Always configure resources.requests – required for HPA CPU calculations.
Combine multiple metrics (e.g., CPU + custom QPS) for more accurate scaling.
Adjust tolerance (default 10 %) if scaling is too sensitive. Example: --horizontal-pod-autoscaler-tolerance=0.2 for 20 %.
Set sensible minReplicas and maxReplicas – min should cover baseline load + 1, max should respect node capacity (≈ 80 % of total pod capacity).
Fast up, slow down – configure behavior.scaleUp.stabilizationWindowSeconds low (0‑30 s) and scaleDown high (300 s) to avoid thrashing.
Monitor HPA events regularly:
kubectl get events --sort-by=.lastTimestamp | grep HorizontalPodAutoscalerIntegrate with Cluster Autoscaler for node‑level elasticity; set --scale-down-delay-after-add=10m to avoid immediate node removal after scaling.
Use VPA alongside HPA to recommend proper resources.requests and prevent over‑aggressive scaling.
Account for metric latency – Prometheus scrape interval (15‑30 s) + HPA sync period (15 s) → total ~30‑45 s. Set stabilizationWindowSeconds larger than this.
Quarterly load‑testing to verify scaling speed (min → max) and stability (no flapping).
13. FAQ
Q1: Difference between HPA, VPA, and Cluster Autoscaler?
HPA – horizontal scaling (adjusts pod replica count) based on metrics.
VPA – vertical scaling (adjusts pod CPU/memory requests), requires pod restart.
Cluster Autoscaler – node‑level scaling (adds/removes nodes) and works together with HPA.
Q2: Why does HPA show <unknown> ? Common reasons:
Metrics Server not deployed or unhealthy.
Pods lack resources.requests.
Custom metric missing or Adapter mis‑configured.
Q3: How to speed up HPA scaling?
Reduce stabilizationWindowSeconds for scale‑up (0‑30 s).
Increase scaleUp.policies.value (more pods per step).
Pre‑pull container images on nodes.
Use kubectl set image for rolling updates instead of full recreation.
Q4: Can HPA use external metrics (e.g., SQS queue length)? Yes – define type: External metric and use an External Metrics Adapter such as KEDA.
metrics:
- type: External
external:
metric:
name: sqs_queue_length
selector:
matchLabels:
queue: my-queue
target:
type: AverageValue
averageValue: "30"Q5: Can HPA coexist with a fixed replica count? No. HPA overwrites spec.replicas. Delete the HPA to pause autoscaling and set replicas manually.
Q6: How to avoid frequent HPA flapping?
Increase stabilizationWindowSeconds for scale‑down (5‑10 min).
Raise tolerance via --horizontal-pod-autoscaler-tolerance.
Combine multiple metrics to smooth spikes.
Q7: Can HPA be disabled during specific time windows? Not natively. Use a CronJob to patch minReplicas and maxReplicas at desired times.
# Daytime (high load)
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"minReplicas":10}}'
# Nighttime (low load)
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"minReplicas":2}}'Q8: Does HPA support GPU metrics? Yes – expose GPU usage via exporters (e.g., NVIDIA DCGM) and map them as custom metrics through the Prometheus Adapter.
Q9: How to monitor HPA scaling history? Prometheus query:
changes(kube_horizontalpodautoscaler_status_current_replicas[24h])Or export Kubernetes events to a log system for longer retention.
Q10: Can HPA span multiple namespaces? No. An HPA controls a single Deployment/StatefulSet within its own namespace.
14. Appendix: One‑Click Deployment Script & Full Config
14.1 One‑Click Deployment Script
#!/bin/bash
# File: deploy-hpa-stack.sh
# Purpose: Automatically deploy Metrics Server, Prometheus, Adapter, example app, and HPA
set -e
echo "[1/6] Checking Kubernetes cluster..."
kubectl cluster-info || { echo "Error: Cannot connect to Kubernetes cluster"; exit 1; }
echo "[2/6] Deploying Metrics Server..."
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml
kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=120s
echo "[3/6] Deploying Prometheus Operator..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.retention=7d \
--wait --timeout=5m
echo "[4/6] Deploying Prometheus Adapter..."
cat > /tmp/prometheus-adapter-values.yaml <<EOF
prometheus:
url: http://prometheus-kube-prometheus-prometheus.monitoring.svc
port: 9090
rules:
default: true
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
EOF
helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring -f /tmp/prometheus-adapter-values.yaml \
--wait --timeout=3m
echo "[5/6] Deploying example application..."
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-app
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
prometheus.io/path: "/metrics"
spec:
containers:
- name: nginx
image: nginx:1.24-alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
- name: nginx-exporter
image: nginx/nginx-prometheus-exporter:0.11
args:
- '-nginx.scrape-uri=http://localhost:80/stub_status'
ports:
- containerPort: 9113
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: nginx-app
namespace: default
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
name: http
- port: 9113
targetPort: 9113
name: metrics
EOF
echo "[6/6] Creating HPA..."
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
EOF
echo "
==== Deployment Complete ===="
echo "Validate with:"
echo " kubectl get hpa"
echo " kubectl top pods"
echo " kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1"Usage:
chmod +x deploy-hpa-stack.sh
./deploy-hpa-stack.sh15. Further Reading
Official documentation:
Kubernetes HPA docs: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Metrics Server GitHub: https://github.com/kubernetes-sigs/metrics-server
Prometheus Adapter: https://github.com/kubernetes-sigs/prometheus-adapter
In‑depth blogs:
HPA algorithm details: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
Custom Metrics best practices: https://github.com/kubernetes-sigs/custom-metrics-apiserver
Community resources:
KEDA (Event‑Driven Autoscaling): https://keda.sh/
Kubernetes Autoscaling SIG: https://github.com/kubernetes/autoscaler
Appendix: Monitoring Collection (Non‑Technical Footer)
The following sections are promotional material unrelated to the technical guide and have been omitted from the translation.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
