Cloud Native 4 min read

Current State of Kubernetes DRA and the New Architecture with ResourceClaimParameters and ResourceSlice

The article examines the scheduling performance and tight coupling issues of Kubernetes DRA before version 1.30, explains the original workflow involving PodSchedulingContext and DRA driver, and then details the latest design that introduces ResourceClaimParameters and ResourceSlice to let the scheduler handle complex device constraints internally.

Infra Learning Club

Sep 29, 2024

Current State of Kubernetes DRA and the New Architecture with ResourceClaimParameters and ResourceSlice

Before Kubernetes 1.30, the Device Resource Allocation (DRA) feature suffered from scheduling performance problems and a tightly coupled scheduling flow with external components, which kept it in alpha status.

The previous component call flow can be summarized as:

User creates a Pod that declares a ResourceClaim. The scheduler detects the newly created Pod referencing the claim.

The scheduler evaluates the Pod 's request for CPU, memory, etc., produces a list of candidate nodes, and stores this information in a PodSchedulingContext object.

A DRA driver watches changes to PodSchedulingContext, filters the candidate node list based on details in the ResourceClaim, and updates the PodSchedulingContext with the refined list.

The scheduler reads the updated PodSchedulingContext and makes the final node selection.

apiVersion: v1
kind: Pod
metadata:
  namespace: gpu-test1
  name: pod2
  labels:
    app: pod
spec:
  containers:
  - name: ctr
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["nvidia-smi -L; sleep 9999"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    source:
      resourceClaimTemplateName: gpu.nvidia.com

The original DRA design aimed to address the limitations of the traditional Device Plugin, which could not handle complex constraints. Different hardware vendors have distinct resource management logic, so DRA introduced a ResourceClaims CRD for vendors to extend. However, this approach left the scheduler without original data for decision‑making.

In the latest DRA design, the scheduler itself handles resource types such as CPU and Memory, eliminating the need for external services. Two built‑in objects are added:

ResourceClaimParameters : stores the conditions a user specifies when requesting a device. The DRA driver converts vendor‑specific parameters into a ResourceClaimParameter that the scheduler can use.

ResourceSlice : DRA reports resources to the kubelet via a kubelet‑plugin, which creates a ResourceSlice CR representing the available device resources.

For example, Nvidia defines a GpuClaimParameters object containing custom GPU parameters. The DRA driver processes this object and produces a corresponding ResourceClaimParameter that the scheduler consumes.

Understanding this background and the new architecture prepares readers for source‑code exploration and issue resolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Scheduler DRA ResourceClaim ResourceSlice

Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.