Mastering Kubernetes API Server Flow Control: APF Explained
This article explains how Kubernetes' API Priority and Fairness (APF) mechanism enhances kube‑apiserver traffic control by introducing FlowSchema and PriorityLevelConfiguration objects, allowing fine‑grained request prioritization, concurrency limits, and queue management beyond the basic inflight throttling flags.
In a Kubernetes cluster, the kube-apiserver is a critical component that handles external HTTPS requests and interacts with other control‑plane components such as controller-manager, scheduler and kubelet.
To protect the apiserver’s stability, two basic throttling flags are provided: --max-requests-inflight: maximum concurrent read‑only requests (default 400). --max-mutating-requests-inflight: maximum concurrent mutating requests (default 200).
These limits prevent overall overload but do not differentiate request priority, which can lead to situations where a runaway application saturates the apiserver, as seen in the OpenAI outage on 2024‑12‑11.
The API Priority and Fairness (APF) mechanism introduces fine‑grained request classification and queuing to avoid such problems.
APF Objects
APF adds two new custom resources: FlowSchema: classifies incoming requests and links them to a PriorityLevelConfiguration. PriorityLevelConfiguration: defines the actual priority, concurrency limits and queue behavior.
Requests are first matched to a FlowSchema, then routed to the associated priority level’s queue for processing.
FlowSchema .spec fields
distinguisherMethod: optional method (ByUser or ByNamespace) to further split requests. matchingPrecedence: numeric order for matching; lower values are evaluated first. Each FlowSchema should have a unique precedence. priorityLevelConfiguration: name of the linked priority level (one per FlowSchema). rules: list of matching rules; if a request matches any rule, it is assigned to the FlowSchema’s priority.
PriorityLevelConfiguration .spec fields
type: Exempt (bypasses queuing) or Limited (subject to queuing). exempt: additional settings when type is Exempt. limited: configuration for limited priority levels, including: borrowingLimitPercent: percentage of concurrency that can be borrowed. lendablePercent: percentage of concurrency that can be lent out. nominalConcurrencyShares: nominal share of total concurrency. limitResponse: behavior when the limit is reached; Reject returns 429 immediately, Queue enqueues the request with further queue settings.
To enable APF, set --enable-priority-and-fairness=true on the kube‑apiserver (enabled by default in newer versions). When APF is active, the two original inflight flags are summed to define the overall concurrency budget, and the per‑priority quotas determine each class’s capacity.
Built‑in FlowSchemas and PriorityLevelConfigurations
Kubernetes ships with a set of predefined FlowSchemas and PriorityLevelConfigurations that cover common internal traffic (node monitoring, kubelet, controller leader election, etc.) as well as catch‑all and global‑default objects to ensure every request is classified.
Users can also create custom FlowSchemas and PriorityLevelConfigurations to tailor request handling to their workloads.
Summary
APF became a stable feature in Kubernetes v1.29, providing more granular traffic control for the kube‑apiserver.
References:
https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
