Current State of Kubernetes DRA and the New Architecture with ResourceClaimParameters and ResourceSlice
The article examines the scheduling performance and tight coupling issues of Kubernetes DRA before version 1.30, explains the original workflow involving PodSchedulingContext and DRA driver, and then details the latest design that introduces ResourceClaimParameters and ResourceSlice to let the scheduler handle complex device constraints internally.
Before Kubernetes 1.30, the Device Resource Allocation (DRA) feature suffered from scheduling performance problems and a tightly coupled scheduling flow with external components, which kept it in alpha status.
The previous component call flow can be summarized as:
User creates a Pod that declares a ResourceClaim. The scheduler detects the newly created Pod referencing the claim.
The scheduler evaluates the Pod 's request for CPU, memory, etc., produces a list of candidate nodes, and stores this information in a PodSchedulingContext object.
A DRA driver watches changes to PodSchedulingContext, filters the candidate node list based on details in the ResourceClaim, and updates the PodSchedulingContext with the refined list.
The scheduler reads the updated PodSchedulingContext and makes the final node selection.
apiVersion: v1
kind: Pod
metadata:
namespace: gpu-test1
name: pod2
labels:
app: pod
spec:
containers:
- name: ctr
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["nvidia-smi -L; sleep 9999"]
resources:
claims:
- name: gpu
resourceClaims:
- name: gpu
source:
resourceClaimTemplateName: gpu.nvidia.comThe original DRA design aimed to address the limitations of the traditional Device Plugin, which could not handle complex constraints. Different hardware vendors have distinct resource management logic, so DRA introduced a ResourceClaims CRD for vendors to extend. However, this approach left the scheduler without original data for decision‑making.
In the latest DRA design, the scheduler itself handles resource types such as CPU and Memory, eliminating the need for external services. Two built‑in objects are added:
ResourceClaimParameters : stores the conditions a user specifies when requesting a device. The DRA driver converts vendor‑specific parameters into a ResourceClaimParameter that the scheduler can use.
ResourceSlice : DRA reports resources to the kubelet via a kubelet‑plugin, which creates a ResourceSlice CR representing the available device resources.
For example, Nvidia defines a GpuClaimParameters object containing custom GPU parameters. The DRA driver processes this object and produces a corresponding ResourceClaimParameter that the scheduler consumes.
Understanding this background and the new architecture prepares readers for source‑code exploration and issue resolution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
