How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure
This article reviews the challenges of scheduling heterogeneous workloads—micro‑services, big‑data, AI, and HPC—on a unified cloud‑native platform, compares existing schedulers like Mesos and YARN, examines Kubernetes ecosystem extensions such as Volcano and YuniKorn, and details the design and components of the Transwarp Scheduler built on Kubernetes Scheduling Framework v2.
How to Solve Mixed Workload Scheduling in a Unified Cloud‑Native Infrastructure
On October 25, the first China Cloud Computing Infrastructure Developer Conference was held in Changsha, where StarRocks presented a talk on “Thoughts and Practices of a Complex Workload Hybrid Scheduler Based on Kubernetes”. The article summarizes the discussion.
Background
Cloud‑native has become the dominant paradigm, with Kubernetes driving enterprises to migrate their infrastructure and applications to cloud‑native architectures. As cloud‑native matures, traditional big‑data analytics and compute workloads are also being moved onto it, creating compatibility challenges such as orchestrating big‑data jobs and achieving data‑locality.
Unified Cloud‑Native Infrastructure
StarRocks has built a unified data‑cloud platform (TDC) that integrates analysis cloud, data cloud, and application cloud, supporting data warehouses, streaming engines, analytics tools, and DevOps. TDC faces the problem of scheduling heterogeneous workloads (MicroService, BigData, AI, HPC) on a single platform.
Existing Schedulers
Two classic schedulers were reviewed:
Mesos : two‑level architecture, DRF‑based resource allocation, flexible but lacking ecosystem support.
YARN : single‑level architecture, hierarchical queues, strong Hadoop ecosystem, but less flexible than Mesos.
Kubernetes native scheduler excels at micro‑service workloads but lacks features needed for big‑data/AI tasks, such as multi‑tenant resource queues, resource sharing, and fine‑grained control.
Kubernetes Ecosystem Extensions
Projects such as Volcano , YuniKorn , and the Scheduling Framework v2 provide batch‑processing, multi‑tenant queues, and plugin‑based extensibility. Volcano adds support for batch, MPI, and AI jobs; YuniKorn offers hierarchical queues, GPU scheduling, and fair sharing.
Transwarp Scheduler Design
Based on the community extensions, StarRocks designed the Transwarp Scheduler using the Scheduling Framework v2. It introduces two CRDs: Queue (hierarchical resource queues) and QueueBinding (binding queues to namespaces or pods). Core plugins include:
QueueSort : sorts pods according to the queue’s algorithm (default HDRF for fairness).
QueueCapacityCheck : pre‑filter that validates queue resource usage.
QueueCapacityReserve : reserve and unreserve queue resources during scheduling.
QueuePreemption : post‑filter that enables resource reclamation.
These plugins enable gang‑scheduling for TensorFlow and Spark jobs, ensuring all required pods are scheduled together or not at all, and allow configurable minimum executor counts for Spark.
Architecture
The Transwarp Scheduler consists of three components:
Scheduler : built on the Scheduling Framework, compiles all plugins.
Controller Manager : controllers for the Queue and QueueBinding CRDs.
Webhook : admission webhooks for validating Queue and QueueBinding objects.
Images illustrate the scheduler architecture, queue‑binding relationship, and overall system diagram.
Future Outlook
Transwarp Scheduler already meets most TDC requirements, addressing limitations of the native Kubernetes scheduler. Future work includes high‑level strategies such as application‑aware and load‑aware scheduling, and continued collaboration with the open‑source community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
