Cloud Native 7 min read

Mastering Kubernetes API Server Flow Control: APF Explained

This article explains how Kubernetes' API Priority and Fairness (APF) mechanism enhances kube‑apiserver traffic control by introducing FlowSchema and PriorityLevelConfiguration objects, allowing fine‑grained request prioritization, concurrency limits, and queue management beyond the basic inflight throttling flags.

System Architect Go
System Architect Go
System Architect Go
Mastering Kubernetes API Server Flow Control: APF Explained

In a Kubernetes cluster, the kube-apiserver is a critical component that handles external HTTPS requests and interacts with other control‑plane components such as controller-manager, scheduler and kubelet.

To protect the apiserver’s stability, two basic throttling flags are provided: --max-requests-inflight: maximum concurrent read‑only requests (default 400). --max-mutating-requests-inflight: maximum concurrent mutating requests (default 200).

These limits prevent overall overload but do not differentiate request priority, which can lead to situations where a runaway application saturates the apiserver, as seen in the OpenAI outage on 2024‑12‑11.

The API Priority and Fairness (APF) mechanism introduces fine‑grained request classification and queuing to avoid such problems.

APF Objects

APF adds two new custom resources: FlowSchema: classifies incoming requests and links them to a PriorityLevelConfiguration. PriorityLevelConfiguration: defines the actual priority, concurrency limits and queue behavior.

Requests are first matched to a FlowSchema, then routed to the associated priority level’s queue for processing.

APF processing flow diagram
APF processing flow diagram

FlowSchema .spec fields

distinguisherMethod

: optional method (ByUser or ByNamespace) to further split requests. matchingPrecedence: numeric order for matching; lower values are evaluated first. Each FlowSchema should have a unique precedence. priorityLevelConfiguration: name of the linked priority level (one per FlowSchema). rules: list of matching rules; if a request matches any rule, it is assigned to the FlowSchema’s priority.

PriorityLevelConfiguration .spec fields

type

: Exempt (bypasses queuing) or Limited (subject to queuing). exempt: additional settings when type is Exempt. limited: configuration for limited priority levels, including: borrowingLimitPercent: percentage of concurrency that can be borrowed. lendablePercent: percentage of concurrency that can be lent out. nominalConcurrencyShares: nominal share of total concurrency. limitResponse: behavior when the limit is reached; Reject returns 429 immediately, Queue enqueues the request with further queue settings.

To enable APF, set --enable-priority-and-fairness=true on the kube‑apiserver (enabled by default in newer versions). When APF is active, the two original inflight flags are summed to define the overall concurrency budget, and the per‑priority quotas determine each class’s capacity.

FlowSchema and PriorityLevelConfiguration spec example
FlowSchema and PriorityLevelConfiguration spec example

Built‑in FlowSchemas and PriorityLevelConfigurations

Kubernetes ships with a set of predefined FlowSchemas and PriorityLevelConfigurations that cover common internal traffic (node monitoring, kubelet, controller leader election, etc.) as well as catch‑all and global‑default objects to ensure every request is classified.

Built‑in FlowSchema and PriorityLevelConfiguration list
Built‑in FlowSchema and PriorityLevelConfiguration list

Users can also create custom FlowSchemas and PriorityLevelConfigurations to tailor request handling to their workloads.

Summary

APF became a stable feature in Kubernetes v1.29, providing more granular traffic control for the kube‑apiserver.

References:

https://kubernetes.io/docs/concepts/cluster-administration/flow-control/

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness

cloud-nativeKubernetesFlow ControlAPI ServerAPFPriority and Fairness
System Architect Go
Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.