Cloud Native 20 min read

Boosting Autonomous Driving Data Pipelines with Koordinator’s ElasticQuota and GPU Sharing

This article details how a leading autonomous‑driving company tackled multi‑tenant resource contention, low GPU utilization, and distributed task dead‑locks on a heterogeneous Kubernetes cluster by adopting Koordinator’s ElasticQuota, Reservation, Gang and Device‑Share features, achieving higher allocation rates, better fairness, and significantly improved GPU efficiency.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Boosting Autonomous Driving Data Pipelines with Koordinator’s ElasticQuota and GPU Sharing

Background

In autonomous driving, data‑driven pipelines require a high‑throughput, long‑lived cloud platform that ingests point‑cloud, image and multi‑sensor streams, transforms them into standardized training, evaluation and replay inputs, and supports massive preprocessing, automated labeling, simulation verification and algorithm development.

Infrastructure Challenges

The platform runs on a heterogeneous cluster with many CPU models and several generations of GPU cards, leading to complex resource profiles and scheduling difficulty.

Core Scheduling Pain Points

Unordered multi‑tenant resource contention causing “bad‑apple‑drives‑good‑apple” scenarios and low quota allocation rates.

Fragmented allocation due to mismatched CPU/GPU resource ratios, resulting in stranded resources.

Low GPU utilization (<15%) in Codespace and small‑model inference workloads.

Distributed task dead‑locks when using Ray, where pods wait for all members to become ready.

Koordinator as the Scheduling Engine

The team selected Koordinator, an extension of the native Kubernetes scheduler, to provide ElasticQuota, Reservation, Gang and Device‑Share capabilities. In production it supports over 100 ElasticQuota objects, schedules 300‑800 k pods per day, and achieves >95 % GPU allocation and >55 % GPU utilization.

ElasticQuota Design

Quota hierarchy consists of a top‑level Group (no borrowing) and business‑level Quotas that allow borrowing. GPU quotas are split into quota.gpu and quota.cpu sub‑quotas to prevent CPU tasks from starving GPU tasks.

Pod Queue Prioritization

A custom queueSort plugin orders pods first by quota priority, then by random order when priorities match, ensuring high‑priority workloads obtain scheduling slots while preventing priority abuse.

Resource Reclamation Improvements

Pod eviction logic now respects PDB constraints, prefers lower‑priority pods, and when priorities tie selects the pod with the smallest consumed resource cost (GPU > memory > CPU).

GPU Sharing and Isolation

Koordinator’s Device‑Share plugin enables GPU card slicing. Integration with HAMi‑Core (since Koordinator 1.6.0) provides per‑pod compute and memory limits, achieving up to 60 % GPU utilization and a 60 % reduction in full‑card usage for Codespace workloads.

Reservation for CPU‑GPU Co‑allocation

The Reservation plugin reserves CPU and memory on GPU nodes for upcoming GPU pods, preventing CPU‑only pods from occupying GPU node resources and raising overall node utilization.

Ray Job & Gang Scheduling

Short‑lived data‑production tasks are moved to Ray. KubeRay creates RayJob, RayCluster and RayService CRDs. By annotating pods with gang.scheduling.koordinator.sh/min-available and gang.scheduling.koordinator.sh/name, Koordinator performs gang scheduling, guaranteeing all required pods start together and avoiding dead‑locks.

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: ray-dox3h3hm
spec:
  activeDeadlineSeconds: 86400
  backoffLimit: 0
  entrypoint: bashiacp/pipeline/pse/prod_2.sh
  rayClusterSpec:
    headGroupSpec:
      template:
        metadata:
          annotations:
            gang.scheduling.koordinator.sh/min-available: "3"
            gang.scheduling.koordinator.sh/name: ray-dox3h3hm
        spec:
          containers:
          - name: main
            schedulerName: koord-scheduler

Future Roadmap

Plans include extending ElasticQuota for distributed‑task cleanup, adding workload queuing to curb pending‑pod spikes, implementing scheduler sharding for higher throughput, and building multi‑cloud, multi‑cluster coordination to meet exponential compute growth.

Kubernetesresource schedulingautonomous drivingKoordinatorGPU SharingElasticQuota
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.