Tagged articles
3 articles
Page 1 of 1
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 10, 2025 · Cloud Native

Koordinator v1.7.0 Brings Network‑Aware Scheduling and Job‑Level Preemption for AI Workloads

Koordinator v1.7.0, the open‑source Kubernetes scheduler, adds network‑topology‑aware scheduling, job‑level preemption, and support for Ascend NPU and Cambricon MLU, delivering unified heterogeneous device management, enhanced GPU sharing, comprehensive API documentation, and best‑practice guides to improve large‑scale AI training efficiency and cluster operations.

AI trainingHeterogeneous DevicesJob Preemption
0 likes · 17 min read
Koordinator v1.7.0 Brings Network‑Aware Scheduling and Job‑Level Preemption for AI Workloads
Alibaba Cloud Native
Alibaba Cloud Native
Mar 7, 2025 · Cloud Native

Koordinator v1.6: Enhancing Heterogeneous GPU Scheduling for Cloud‑Native Clusters

Koordinator v1.6 introduces GPU topology‑aware scheduling, end‑to‑end GPU & RDMA joint allocation, fine‑grained GPU sharing, differentiated scoring for GPU vs CPU resources, advanced reservation and mixed‑workload support, plus numerous scheduler and rescheduler optimizations to improve resource utilization and performance in Kubernetes clusters.

Heterogeneous DevicesKoordinatorResource Management
0 likes · 27 min read
Koordinator v1.6: Enhancing Heterogeneous GPU Scheduling for Cloud‑Native Clusters
Efficient Ops
Efficient Ops
Jan 14, 2018 · Operations

How We Built a Unified Network Automation Framework for Heterogeneous Devices

This article shares how a telecom operations team tackled the complexity of managing dozens of device vendors and hundreds of models by designing a Python‑based automation module called Forward, which standardizes low‑level actions, provides reusable libraries, and enables rapid script composition for diverse network scenarios.

Heterogeneous DevicesInfrastructure as CodeOperations
0 likes · 10 min read
How We Built a Unified Network Automation Framework for Heterogeneous Devices