Operations 24 min read

Understanding Scheduler Architectures: From Batch to Shared‑State Designs

This article surveys the evolution of schedulers—from early batch systems and OS process schedulers to modern centralized, two‑level, and shared‑state designs—explaining their core concepts, trade‑offs, and real‑world examples such as YARN, Mesos, Spark, Borg, and Kubernetes.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Understanding Scheduler Architectures: From Batch to Shared‑State Designs

1. Definition of Scheduler

A scheduler is a core component in both single‑machine and distributed systems that decides when and where tasks run, encompassing batch schedulers, preemptive process schedulers, cron‑like tools, language runtimes (e.g., Go goroutine scheduler), and cluster resource managers such as Hadoop YARN and Airflow.

2. Scheduler Design Overview

System design often repeats similar abstractions at different layers: caches in a CPU, memory hierarchies in a machine, and storage tiers in a cluster. As scale grows, problems that were trivial become challenging, especially state synchronization, fault tolerance, and scalability.

3. Types of Distributed Schedulers

3.1 Centralized Scheduler

A single instance (monolithic) manages all resources and tasks. It is simple, offers stable state synchronization, but suffers from single‑point‑of‑failure and limited scalability.

Centralized Scheduler diagram
Centralized Scheduler diagram

3.2 Two‑Level Scheduler

Combines a central scheduler with partitioned sub‑schedulers. The central scheduler handles coarse‑grained allocation, while partitions manage fine‑grained tasks, improving flexibility and supporting both high‑throughput and low‑latency workloads, but increasing state‑sync complexity.

Two‑level Scheduler diagram
Two‑level Scheduler diagram

3.3 Shared‑State Scheduler

All schedulers share a common cluster state service; individual schedulers are independent services that read/write this state. This micro‑kernel style improves extensibility, fault tolerance, and scalability. Kubernetes, Borg, and the newer Ray system follow this model.

Shared‑State Scheduler diagram
Shared‑State Scheduler diagram

4. Representative Cases

OS Process Scheduler : Centralized management of CPU, memory, and I/O for processes and threads.

Hadoop YARN : Central ResourceManager with per‑node NodeManagers; supports high‑availability via standby masters.

Mesos : Two‑level design with a Master offering resources to independent Frameworks that run their own schedulers.

Spark : Central Driver schedules Executors; Spark Drizzle adds a local scheduler per node to reduce streaming latency.

Borg / Kubernetes : Evolved from a centralized BorgMaster to a shared‑state architecture where schedulers are separate services; uses containers/cgroups for isolation.

Omega : Treats resource allocation and task scheduling as database transactions, providing optimistic locking, dead‑lock detection, and procedural checks.

5. Summary

For small‑scale systems, a centralized scheduler is simple and effective. As clusters grow or custom scheduling policies are needed, two‑level designs become attractive, though they add complexity. Shared‑state schedulers are now mainstream, offering simple APIs and high scalability, exemplified by Kubernetes.

6. Outlook

Future work includes precise task‑demand prediction (potentially leveraging AI) and efficient large‑scale artifact distribution (e.g., container images) using peer‑to‑peer techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsSystem DesignSchedulerCluster Computing
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.