Master Kubernetes Horizontal Pod Autoscaling with Metrics Server and Custom Metrics
This guide walks through setting up Kubernetes Horizontal Pod Autoscaling (HPA) using the Metrics Server for CPU and memory metrics, deploying a sample Go web app, performing load tests, and extending autoscaling with Prometheus‑based custom metrics for fine‑grained scaling control.
Overview of Autoscaling in Kubernetes
Automatic scaling adjusts workloads based on resource usage. In Kubernetes, scaling operates on two levels: the Cluster Autoscaler expands or shrinks nodes, while the Horizontal Pod Autoscaler (HPA) adjusts the number of pods in a Deployment or ReplicaSet. HPA works independently of the underlying cloud provider.
Evolution of HPA
HPA was introduced in Kubernetes v1.1 and originally scaled pods based on observed CPU utilization. Later versions added memory‑based scaling and, with the Custom Metrics API introduced in v1.6, allowed scaling on arbitrary metrics. The aggregation layer added in v1.7 enables third‑party components like Prometheus to expose application‑specific metrics to HPA.
Deploying the Metrics Server
The Metrics Server aggregates resource usage data from the kubelet / cAdvisor summary API and replaces the older Heapster component.
cd $GOPATH
git clone https://github.com/stefanprodan/k8s-prom-hpaDeploy the Metrics Server in the kube-system namespace: kubectl create -f ./metrics-server After a minute the server begins reporting CPU and memory usage for nodes and pods. Verify node metrics:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .Verify pod metrics:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .CPU/Memory‑Based Autoscaling Example
Deploy a small Go web application ( podinfo) in the default namespace:
kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yamlExpose it via a NodePort service (e.g., http://<K8S_PUBLIC_IP>:31198).
Create an HPA that maintains at least two replicas and scales up to ten when average CPU exceeds 80% or memory exceeds 200 Mi:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 200MiApply the HPA: kubectl create -f ./podinfo/podinfo-hpa.yaml Generate load with hey to trigger scaling:
# install hey
go get -u github.com/rakyll/hey
# 10 000 requests, 10 QPS, 5 concurrent workers
hey -n 10000 -q 10 -c 5 http://<K8S_PUBLIC_IP>:31198/Monitor HPA status and events:
kubectl get hpa
kubectl describe hpaDeploying Prometheus and the Custom Metrics Adapter
To scale on application‑specific metrics, deploy Prometheus and the k8s-prometheus-adapter in a dedicated monitoring namespace.
# Create the namespace
kubectl create -f ./namespaces.yaml
# Deploy Prometheus v2
kubectl create -f ./prometheus
# Generate TLS certificates for the adapter
make certs
# Deploy the custom‑metrics API adapter
kubectl create -f ./custom-metrics-apiList available custom metrics:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .Example: retrieve filesystem usage for all pods in the monitoring namespace:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/*/fs_usage_bytes" | jq .Custom‑Metric‑Based Autoscaling
Deploy the podinfo service again (without the HPA) and expose a custom metric http_requests_total. The Prometheus adapter strips the _total suffix and presents it as http_requests.
Verify the metric values:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests" | jq .Create an HPA that scales when the per‑pod request rate exceeds 10 requests per second:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: http_requests
targetAverageValue: 10Apply the custom‑metric HPA:
kubectl create -f ./podinfo/podinfo-hpa-custom.yamlGenerate load at ~25 RPS:
# install hey if needed
go get -u github.com/rakyll/hey
hey -n 10000 -q 5 -c 5 http://<K8S_PUBLIC_IP>:31198/healthzObserve scaling events (e.g., scaling from 2 to 3 replicas) via kubectl describe hpa. After the load subsides, the HPA scales the deployment back to the minimum replica count.
Conclusion
While CPU and memory metrics suffice for many workloads, web and mobile back‑ends often require request‑rate‑based scaling to handle traffic spikes. Batch jobs may scale on queue length or other custom signals. By exposing appropriate metrics through Prometheus and the custom‑metrics API, you can fine‑tune autoscaling behavior to meet SLA requirements and ensure high availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
