Cloud Native 7 min read

How Knative Autoscaler Powers Serverless Scaling: KPA vs HPA Explained

This article explains the principles behind Knative Autoscaler, compares Knative Pod Autoscaler (KPA) with Kubernetes Horizontal Pod Autoscaler (HPA), and provides step‑by‑step configuration and demo instructions for achieving true serverless scaling on Kubernetes.

Qingyun Technology Community
Qingyun Technology Community
Qingyun Technology Community
How Knative Autoscaler Powers Serverless Scaling: KPA vs HPA Explained

Background

Major cloud providers now offer Serverless Kubernetes services that simplify cluster management and reduce operational burden. To support Serverless applications effectively, a system needs capabilities such as upgrade, rollback, canary release, traffic management, and elastic scaling.

Knative, built on Kubernetes, provides a Serverless orchestration framework. Its Autoscaler component monitors traffic and scales replicas up or down, including scaling to zero when no traffic is received.

Autoscaler Principle

The Autoscaler adjusts replica count based on monitored metrics such as concurrency, RPS, or CPU, according to configured thresholds.

KPA vs HPA

Knative Serving supports both Knative Pod Autoscaler (KPA) and Kubernetes Horizontal Pod Autoscaler (HPA). Their characteristics:

KPA

Part of Knative Serving and enabled by default.

Can scale from 0.

Does not support CPU‑based scaling.

HPA

Not part of Knative Serving; enabled after installing Kubernetes.

Cannot scale from 0.

Supports CPU‑based scaling.

Scaling Configuration

Scaling configuration defines how pods are scaled up or down, including stable windows, scaling policies, and limits. Annotations determine whether KPA or HPA is used, with keys such as pod-autoscaler-class and autoscaling.knative.dev/class. Possible values are "kpa.autoscaling.knative.dev" or "hpa.autoscaling.knative.dev".

Configuration Example

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
    spec:
      containers:
      - image: gcr.io/knative-samples/helloworld-go

Metrics can be set per revision using autoscaling.knative.dev/metric. KPA supports concurrency and RPS; HPA supports CPU.

Demo Steps

Create a demo service using a YAML manifest (shown above).

Apply the manifest with kubectl apply -f -.

Observe pods scaling to zero when idle and scaling back on traffic.

Check service status with kubectl get ksvc and access it via the provided URL.

Conclusion

The article explains Knative Autoscaler mechanisms, compares KPA and HPA, and provides practical configuration and demo steps, demonstrating how KPA enables true serverless scaling on Kubernetes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

serverlessKubernetesKnativeHPAAutoscalerKPA
Qingyun Technology Community
Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.