Cloud Native 15 min read

How ByteDance’s Gödel Scheduler Unifies Online and Offline Workloads at Massive Scale

The article details ByteDance’s Gödel Scheduler, a cloud‑native, distributed Kubernetes scheduler that unifies online and offline workloads, describing its architecture, enhanced features, performance gains, roadmap, and open‑source plans, including its multi‑instance design, optimistic concurrency, and rescheduling capabilities for improved throughput and scheduling quality.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How ByteDance’s Gödel Scheduler Unifies Online and Offline Workloads at Massive Scale

Background

Since its open‑source debut in 2014, Kubernetes has become the de‑facto standard for container orchestration. ByteDance’s infrastructure team adopted Kubernetes early to build a private cloud platform.

Rapid growth across ByteDance’s business lines—micro‑services, recommendation, advertising, search, machine learning, big data, and storage—has driven a massive increase in required compute resources.

Initially, online and offline services used separate resource pools with manual borrowing during peak events, leading to complex, inefficient processes and limited resource utilization.

Enhancing the Kubernetes Default Scheduler

From 2018 onward, ByteDance optimized Kubernetes components. By 2019, the native scheduler could not meet the needs of the promotion search workload, requiring finer‑grained resource scheduling, flexible preemption, and higher throughput (the default scheduler handled only ~10 Pods/s on a 5 000‑node cluster).

Extended non‑native resource scheduling to support memory bandwidth, network bandwidth, etc.

Added micro‑topology scheduling.

Refactored preemption with a plugin‑friendly framework.

Optimized cache‑to‑snapshot synchronization and introduced incremental updates.

Implemented scheduling result caching to reduce redundant calculations.

Improved preemption by reorganizing data structures and pruning unnecessary work.

These changes increased scheduling throughput by dozens of times, reaching a stable 300 Pods/s on a 10 000‑node production cluster.

Gödel Scheduler

Starting in 2020, ByteDance launched the “offline fusion” project to improve resource utilization and operational efficiency. The native Kubernetes scheduler, focused on Pod‑level scheduling, lacked support for higher‑level “Job” semantics and could not scale to the required throughput for batch workloads.

Gödel Scheduler is a distributed scheduler built on top of the Kubernetes scheduler, employing optimistic concurrency to parallelize the most time‑consuming filtering and scoring phases, thereby boosting large‑scale cluster throughput.

Key characteristics:

Two‑level scheduling semantics (Unit and Pod) with a secondary scheduling framework for flexible batch scheduling.

Rich functionality and high performance to support online, offline (batch, streaming), and training workloads.

Compatibility with the Kubernetes ecosystem, allowing it to replace the default scheduler.

Architecture

The scheduler consists of three components: Dispatcher, Scheduler, and Binder. The Scheduler runs as multiple instances with optimistic concurrency; Dispatcher and Binder are single‑instance services.

Dispatcher

Handles application queuing, dispatching, and node partitioning. Sub‑components include:

Sort Policy Manager – supports FIFO, DRF/FairShare, and future priority‑based policies.

Dispatching Policy Manager – currently uses a LoadBalance strategy, with plans for plugin‑based configuration.

Node Shuffler – partitions nodes among Scheduler instances for load distribution.

Scheduler Maintainer – monitors health and load of Scheduler instances.

Reconciler – periodically checks and corrects the state of Pods, Nodes, Schedulers, and SchedulingUnits.

Scheduler

Responsible for making scheduling and preemption decisions (execution is performed by the Binder). It comprises a Unit scheduling framework and a Pod scheduling framework, processing three main phases: Node Organizing, Unit Scheduling, and Unit Preempting.

Node Organizing

Filters and groups nodes using plugins:

Locating plugins – filter out unsuitable nodes based on application requirements (e.g., Local PV, DaemonSet, resource reservations).

Node Grouping plugins – group remaining nodes by residual resources or job‑level affinity.

Unit Scheduling

Matches and scores nodes for each scheduling unit using:

Filtering plugins – discard nodes that do not meet unit requirements.

Scoring plugins – rank the remaining nodes.

Unit Preempting

If no suitable node is found, the preemptor searches for victim Pods to evict and re‑evaluates placement using:

Victim Searching – locate pods that can be preempted.

Candidates Sorting – rank victims to choose the best ones for eviction.

Binder

Executes optimistic conflict checks, performs preemption, prepares resources (e.g., dynamic volume creation), and carries out the final binding. It includes:

ConflictResolver – checks cross‑node and single‑node conflicts.

PreemptionOperator – deletes victim pods when needed.

UnitBinder – handles pre‑binding tasks and the actual bind operation.

A PodGroup controller is also integrated for managing PodGroup lifecycle.

Results and Roadmap

Over the past two years, Gödel Scheduler has been deployed at massive scale within ByteDance, supporting complex workloads for products like Douyin and Toutiao. Performance optimizations enable a single shard throughput of over 2 000 Pods/s and more than 5 000 Pods/s across multiple shards, with the largest internal cluster exceeding 20 000 nodes and 1 million Pods.

The system’s stability was demonstrated in the 2023 SoCC conference paper “Gödel: Unified Large‑Scale Resource Management and Scheduling at ByteDance.”

ByteDance plans to open‑source Gödel Scheduler, offering a new scheduling solution for cloud‑native offline workloads and further integrating with major big‑data and machine‑learning frameworks.

Future work includes continuous feature expansion, enhanced extensibility, and addressing the trade‑off between scheduling performance and quality through a rescheduler (Gödel Rescheduler) that performs post‑placement optimization.

Future Plans

The open‑source team will keep iterating on Gödel Scheduler, adding functionalities such as resource reservation and inter‑queue resource management, while improving support for high‑deployment‑rate and high‑preemption scenarios. Ecosystem building and compatibility with mainstream systems remain priorities.

Project repository: github.com/kubewharf/godel-scheduler

Gödel Scheduler architecture diagram
Gödel Scheduler architecture diagram
Performance optimizationKubernetesscheduler
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.