Cloud Native 16 min read

How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure

This article reviews the challenges of scheduling heterogeneous workloads—micro‑services, big‑data, AI, and HPC—on a unified cloud‑native platform, compares existing schedulers like Mesos and YARN, examines Kubernetes ecosystem extensions such as Volcano and YuniKorn, and details the design and components of the Transwarp Scheduler built on Kubernetes Scheduling Framework v2.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure

How to Solve Mixed Workload Scheduling in a Unified Cloud‑Native Infrastructure

On October 25, the first China Cloud Computing Infrastructure Developer Conference was held in Changsha, where StarRocks presented a talk on “Thoughts and Practices of a Complex Workload Hybrid Scheduler Based on Kubernetes”. The article summarizes the discussion.

Background

Cloud‑native has become the dominant paradigm, with Kubernetes driving enterprises to migrate their infrastructure and applications to cloud‑native architectures. As cloud‑native matures, traditional big‑data analytics and compute workloads are also being moved onto it, creating compatibility challenges such as orchestrating big‑data jobs and achieving data‑locality.

Unified Cloud‑Native Infrastructure

StarRocks has built a unified data‑cloud platform (TDC) that integrates analysis cloud, data cloud, and application cloud, supporting data warehouses, streaming engines, analytics tools, and DevOps. TDC faces the problem of scheduling heterogeneous workloads (MicroService, BigData, AI, HPC) on a single platform.

Existing Schedulers

Two classic schedulers were reviewed:

Mesos : two‑level architecture, DRF‑based resource allocation, flexible but lacking ecosystem support.

YARN : single‑level architecture, hierarchical queues, strong Hadoop ecosystem, but less flexible than Mesos.

Kubernetes native scheduler excels at micro‑service workloads but lacks features needed for big‑data/AI tasks, such as multi‑tenant resource queues, resource sharing, and fine‑grained control.

Kubernetes Ecosystem Extensions

Projects such as Volcano , YuniKorn , and the Scheduling Framework v2 provide batch‑processing, multi‑tenant queues, and plugin‑based extensibility. Volcano adds support for batch, MPI, and AI jobs; YuniKorn offers hierarchical queues, GPU scheduling, and fair sharing.

Transwarp Scheduler Design

Based on the community extensions, StarRocks designed the Transwarp Scheduler using the Scheduling Framework v2. It introduces two CRDs: Queue (hierarchical resource queues) and QueueBinding (binding queues to namespaces or pods). Core plugins include:

QueueSort : sorts pods according to the queue’s algorithm (default HDRF for fairness).

QueueCapacityCheck : pre‑filter that validates queue resource usage.

QueueCapacityReserve : reserve and unreserve queue resources during scheduling.

QueuePreemption : post‑filter that enables resource reclamation.

These plugins enable gang‑scheduling for TensorFlow and Spark jobs, ensuring all required pods are scheduled together or not at all, and allow configurable minimum executor counts for Spark.

Architecture

The Transwarp Scheduler consists of three components:

Scheduler : built on the Scheduling Framework, compiles all plugins.

Controller Manager : controllers for the Queue and QueueBinding CRDs.

Webhook : admission webhooks for validating Queue and QueueBinding objects.

Images illustrate the scheduler architecture, queue‑binding relationship, and overall system diagram.

Future Outlook

Transwarp Scheduler already meets most TDC requirements, addressing limitations of the native Kubernetes scheduler. Future work includes high‑level strategies such as application‑aware and load‑aware scheduling, and continued collaboration with the open‑source community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud nativeBig DataAIKubernetesscheduler
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.