Cloud Native 9 min read

How GlobalElasticQuotaTree Enables Elastic Multi‑Cluster Quota Management in Kubernetes

This article explains how GlobalElasticQuotaTree extends Kubernetes native elastic quota to multi‑cluster environments, providing hierarchical quota structures, Min/Max borrowing, cluster‑level control, and workload‑type support to improve resource utilization for AI platforms.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How GlobalElasticQuotaTree Enables Elastic Multi‑Cluster Quota Management in Kubernetes

Background

As AI technologies become widespread in enterprises, many companies build unified AI platforms to offer model training and inference services. These platforms face resource‑management challenges such as heterogeneous multi‑cluster environments, multi‑tenant quota requirements, and low resource utilization under static quota models.

GlobalElasticQuotaTree Overview

ACK One Fleet provides unified multi‑cluster management, and on top of it introduces GlobalElasticQuotaTree, a Kube‑Native solution that extends elastic quota across clusters, enabling pooled and elastic resource sharing.

Architecture and Core Capabilities

GlobalElasticQuotaTree offers a global quota view at the fleet level, featuring a hierarchical quota structure, Min/Max elastic borrowing, and cluster‑level quota control.

Hierarchical Quota Structure

The quota tree mirrors an organization hierarchy (department → team → project). Parent nodes define total resources for a department, child nodes allocate resources to teams or projects, and leaf nodes attach to specific namespaces that run workloads.

Min/Max Elastic Borrowing Mechanism

Each team configures Min (guaranteed resources) and Max (upper limit). Teams can borrow idle resources from siblings under the same parent, breaking static quota silos and improving overall utilization.

Cluster‑Level Quota Control

The global quota tree is split into multiple ElasticQuotaTree objects and distributed to member clusters. Each cluster receives a differentiated quota share based on its capacity, with automatic synchronization of changes.

Support for Multiple Workload Types

GlobalElasticQuotaTree supports various AI/ML and batch workloads, ensuring that both single‑node tasks and distributed training jobs are scheduled with quota checks, preserving Gang Scheduling semantics.

Quota‑Based Scheduling Strategies

In multi‑cluster scenarios, the fleet supports quota‑based scheduling policies:

Binpack : Prioritizes packing workloads into clusters with higher utilization, filling one cluster before using another.

Spread : Distributes workloads evenly across clusters to avoid overload and improve availability.

Typical Scenario

For an enterprise AI platform with algorithm and platform teams sharing Beijing and Hangzhou GPU clusters, GlobalElasticQuotaTree provides:

Unified global view: a single quota tree automatically propagated to each cluster.

Resource pooling: all GPUs are centrally managed and elastically shared.

Guarantee isolation: Min settings protect core business teams from being starved.

Elastic borrowing: idle GPUs are borrowed by other teams during off‑peak periods, boosting utilization.

Conclusion

By defining a global quota tree at the fleet level and decomposing it into per‑cluster ElasticQuotaTree objects, GlobalElasticQuotaTree delivers unified quota management, hierarchical structures, elastic borrowing, cluster‑level control, multi‑workload support, and flexible scheduling strategies, thereby balancing resource guarantees with high overall utilization.

Configuration Example

apiVersion: quota.one.alibabacloud.com/v1alpha1
kind: GlobalElasticQuotaTree
metadata:
  name: ai-platform-quota
  namespace: kube-system
spec:
  propagation: true
  tree:
  # root node: total global resources
  - name: root
    min:
      nvidia.com/gpu: "16"
    max:
      nvidia.com/gpu: "16"
    children:
    - team-a
    - team-b
  # Team A (algorithm team): guarantee 10 GPU, max 14 GPU
  - name: team-a
    min:
      nvidia.com/gpu: "10"
    max:
      nvidia.com/gpu: "14"
    namespaces:
    - algorithm
    staticAssignments:
    - clusterName: cluster-beijing
      min:
        nvidia.com/gpu: "6"
      max:
        nvidia.com/gpu: "8"
    - clusterName: cluster-hangzhou
      min:
        nvidia.com/gpu: "4"
      max:
        nvidia.com/gpu: "6"
  # Team B (platform team): guarantee 6 GPU, max 10 GPU
  - name: team-b
    min:
      nvidia.com/gpu: "6"
    max:
      nvidia.com/gpu: "10"
    namespaces:
    - platform
    staticAssignments:
    - clusterName: cluster-beijing
      min:
        nvidia.com/gpu: "4"
      max:
        nvidia.com/gpu: "6"
    - clusterName: cluster-hangzhou
      min:
        nvidia.com/gpu: "2"
      max:
        nvidia.com/gpu: "4"

Illustrations

cloud nativeKubernetesresource managementMulti-ClusterElasticQuota
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.