Cloud Native 10 min read

How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

This article explains Knative’s traffic‑driven autoscaling and gray‑release capabilities, detailing the request flow architecture, the roles of Service, Configuration, Route and Revision, and walks through built‑in scaling strategies such as KPA, HPA, scheduled‑HPA, event‑gateway and custom plugins, with practical examples.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

Traffic Request Mechanism

In Knative a Service defines the desired version set and the traffic split. The Service creates a Configuration and a Route . The Configuration produces a Revision for each change; each Revision owns a Deployment that runs the Pods. The Route contains a traffic block that maps Revision identifiers (or tags) to a percentage of incoming requests. An Ingress controller translates the Route rules into cloud‑provider load‑balancer (e.g., Alibaba Cloud SLB) rules, which forward traffic to the appropriate Pods.

Service Lifecycle

A Knative Service is a developer‑facing resource composed of two sub‑resources:

Configuration : declares the container image, environment variables, resource requests, etc.

Route : defines how traffic is distributed among the Revisions produced by the Configuration.

Configuration

The Configuration represents the desired state of a container. Every update (e.g., new image tag, env change) creates a new immutable Revision snapshot, enabling version control, rollback, and gray‑release workflows.

Route

The Route controls traffic routing. Its traffic block lists one or more entries, each specifying a Revision name (or tag) and a traffic percentage. The sum of percentages must be 100 %.

Revision

A Revision is an immutable snapshot of a Configuration. It contains the exact container image reference and the configuration that was in effect at the time of creation. Revisions can be addressed directly via a generated URL when a tag is assigned.

Traffic‑Based Gray Release

Create a new Revision by updating the Service’s Configuration (e.g., change image tag to produce v2).

Modify the Route’s traffic block to split traffic between the existing Revision ( v1) and the new Revision ( v2) – for example, 70 % to v1 and 30 % to v2.

Monitor the new version. If it is stable, increase its traffic share gradually until it reaches 100 %.

If a problem is detected, adjust the percentages to route all traffic back to the previous Revision, achieving an instant rollback.

Optionally assign a tag to a Revision in the Route. Knative then creates a stable URL that points directly to that Revision, useful for debugging or testing a specific version.

Automatic Scaling Strategies

KPA – Knative Pod Autoscaler

HPA – Horizontal Pod Autoscaler (native Kubernetes)

Scheduled + HPA Fusion

Event Gateway – request‑count‑driven scaling with one‑to‑one task dispatch

Custom scaling plugins – user‑defined metrics and replica adjustments

KPA (Knative Pod Autoscaler)

When a Service receives no traffic it is attached to an Activator pod. The first request hits the Activator, which invokes the Autoscaler. The Autoscaler reads concurrency metrics collected by each Pod’s queue‑proxy container, aggregates them, and decides the target replica count. Once Pods are ready, traffic bypasses the Activator and goes directly to the Pods.

HPA (Horizontal Pod Autoscaler)

Knative wraps the native Kubernetes HPA. By specifying metric targets (CPU, memory, etc.) in the Revision’s autoscaling section, the underlying HPA automatically adjusts the Deployment replica count.

Scheduled + HPA Fusion

A scheduled scaling rule sets a baseline replica count at predefined times (pre‑warming). The HPA can increase the count further based on real‑time CPU or memory usage. The effective replica count is the maximum of the scheduled baseline and the HPA‑computed value, ensuring capacity for both predictable load spikes and unexpected bursts.

Event Gateway

The Event Gateway watches the incoming request rate. It scales the Deployment proportionally to the request count and then performs a one‑to‑one dispatch, guaranteeing that each Pod processes a single request when required (useful for latency‑sensitive or stateful workloads).

Custom Scaling Plugin

A custom plugin must implement two functions:

Collect metrics – e.g., pull concurrency or custom business metrics from each Pod’s queue‑proxy or an external source.

Adjust the replica count – update the spec.replicas field of the underlying Deployment based on the processed metrics.

By providing these hooks, users can define arbitrary scaling behavior (e.g., scaling on queue depth, external API latency, or business KPIs).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud nativeServerlessautoscalingKnativeHPAGray DeploymentKPA
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.