How Knative Autoscaler Powers Serverless Scaling: KPA vs HPA Explained
This article explains the principles behind Knative Autoscaler, compares Knative Pod Autoscaler (KPA) with Kubernetes Horizontal Pod Autoscaler (HPA), and provides step‑by‑step configuration and demo instructions for achieving true serverless scaling on Kubernetes.
Background
Major cloud providers now offer Serverless Kubernetes services that simplify cluster management and reduce operational burden. To support Serverless applications effectively, a system needs capabilities such as upgrade, rollback, canary release, traffic management, and elastic scaling.
Knative, built on Kubernetes, provides a Serverless orchestration framework. Its Autoscaler component monitors traffic and scales replicas up or down, including scaling to zero when no traffic is received.
Autoscaler Principle
The Autoscaler adjusts replica count based on monitored metrics such as concurrency, RPS, or CPU, according to configured thresholds.
KPA vs HPA
Knative Serving supports both Knative Pod Autoscaler (KPA) and Kubernetes Horizontal Pod Autoscaler (HPA). Their characteristics:
KPA
Part of Knative Serving and enabled by default.
Can scale from 0.
Does not support CPU‑based scaling.
HPA
Not part of Knative Serving; enabled after installing Kubernetes.
Cannot scale from 0.
Supports CPU‑based scaling.
Scaling Configuration
Scaling configuration defines how pods are scaled up or down, including stable windows, scaling policies, and limits. Annotations determine whether KPA or HPA is used, with keys such as pod-autoscaler-class and autoscaling.knative.dev/class. Possible values are "kpa.autoscaling.knative.dev" or "hpa.autoscaling.knative.dev".
Configuration Example
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
spec:
containers:
- image: gcr.io/knative-samples/helloworld-goMetrics can be set per revision using autoscaling.knative.dev/metric. KPA supports concurrency and RPS; HPA supports CPU.
Demo Steps
Create a demo service using a YAML manifest (shown above).
Apply the manifest with kubectl apply -f -.
Observe pods scaling to zero when idle and scaling back on traffic.
Check service status with kubectl get ksvc and access it via the provided URL.
Conclusion
The article explains Knative Autoscaler mechanisms, compares KPA and HPA, and provides practical configuration and demo steps, demonstrating how KPA enables true serverless scaling on Kubernetes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qingyun Technology Community
Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
