Cloud Native 20 min read

Master Kubernetes Horizontal Pod Autoscaling with Metrics Server and Custom Metrics

This guide walks through setting up Kubernetes Horizontal Pod Autoscaling (HPA) using the Metrics Server for CPU and memory metrics, deploying a sample Go web app, performing load tests, and extending autoscaling with Prometheus‑based custom metrics for fine‑grained scaling control.

Full-Stack DevOps & Kubernetes

Oct 23, 2020

Master Kubernetes Horizontal Pod Autoscaling with Metrics Server and Custom Metrics

Overview of Autoscaling in Kubernetes

Automatic scaling adjusts workloads based on resource usage. In Kubernetes, scaling operates on two levels: the Cluster Autoscaler expands or shrinks nodes, while the Horizontal Pod Autoscaler (HPA) adjusts the number of pods in a Deployment or ReplicaSet. HPA works independently of the underlying cloud provider.

Evolution of HPA

HPA was introduced in Kubernetes v1.1 and originally scaled pods based on observed CPU utilization. Later versions added memory‑based scaling and, with the Custom Metrics API introduced in v1.6, allowed scaling on arbitrary metrics. The aggregation layer added in v1.7 enables third‑party components like Prometheus to expose application‑specific metrics to HPA.

Deploying the Metrics Server

The Metrics Server aggregates resource usage data from the kubelet / cAdvisor summary API and replaces the older Heapster component.

cd $GOPATH
git clone https://github.com/stefanprodan/k8s-prom-hpa

Deploy the Metrics Server in the kube-system namespace: kubectl create -f ./metrics-server After a minute the server begins reporting CPU and memory usage for nodes and pods. Verify node metrics:

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .

Verify pod metrics:

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .

CPU/Memory‑Based Autoscaling Example

Deploy a small Go web application ( podinfo) in the default namespace:

kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yaml

Expose it via a NodePort service (e.g., http://<K8S_PUBLIC_IP>:31198).

Create an HPA that maintains at least two replicas and scales up to ten when average CPU exceeds 80% or memory exceeds 200 Mi:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageValue: 200Mi

Apply the HPA: kubectl create -f ./podinfo/podinfo-hpa.yaml Generate load with hey to trigger scaling:

# install hey
go get -u github.com/rakyll/hey
# 10 000 requests, 10 QPS, 5 concurrent workers
hey -n 10000 -q 10 -c 5 http://<K8S_PUBLIC_IP>:31198/

Monitor HPA status and events:

kubectl get hpa
kubectl describe hpa

Deploying Prometheus and the Custom Metrics Adapter

To scale on application‑specific metrics, deploy Prometheus and the k8s-prometheus-adapter in a dedicated monitoring namespace.

# Create the namespace
kubectl create -f ./namespaces.yaml
# Deploy Prometheus v2
kubectl create -f ./prometheus
# Generate TLS certificates for the adapter
make certs
# Deploy the custom‑metrics API adapter
kubectl create -f ./custom-metrics-api

List available custom metrics:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

Example: retrieve filesystem usage for all pods in the monitoring namespace:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/*/fs_usage_bytes" | jq .

Custom‑Metric‑Based Autoscaling

Deploy the podinfo service again (without the HPA) and expose a custom metric http_requests_total. The Prometheus adapter strips the _total suffix and presents it as http_requests.

Verify the metric values:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests" | jq .

Create an HPA that scales when the per‑pod request rate exceeds 10 requests per second:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: http_requests
      targetAverageValue: 10

Apply the custom‑metric HPA:

kubectl create -f ./podinfo/podinfo-hpa-custom.yaml

Generate load at ~25 RPS:

# install hey if needed
go get -u github.com/rakyll/hey
hey -n 10000 -q 5 -c 5 http://<K8S_PUBLIC_IP>:31198/healthz

Observe scaling events (e.g., scaling from 2 to 3 replicas) via kubectl describe hpa. After the load subsides, the HPA scales the deployment back to the minimum replica count.

Conclusion

While CPU and memory metrics suffice for many workloads, web and mobile back‑ends often require request‑rate‑based scaling to handle traffic spikes. Batch jobs may scale on queue length or other custom signals. By exposing appropriate metrics through Prometheus and the custom‑metrics API, you can fine‑tune autoscaling behavior to meet SLA requirements and ensure high availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Auto Scaling Horizontal Pod Autoscaler custom metrics metrics-server

Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.