Comparison of Common Big Data Scheduling Systems: Oozie, Azkaban, Airflow, XXL‑Job, and DolphinScheduler
This article provides a comparative overview of several popular big‑data workflow schedulers—including Oozie, Azkaban, Airflow, XXL‑Job, and DolphinScheduler—detailing their supported task types, visual workflow definition, monitoring capabilities, pause/resume features, high‑availability options, and other notable characteristics.
Big data scheduling systems drive offline batch and near‑real‑time computation tasks. This article classifies and compares several common schedulers, relating them to the scheduling system in Alibaba Cloud MaxCompute.
Oozie
Oozie is a workflow coordination system contributed by Cloudera to Apache, primarily used to manage Hadoop jobs.
Supported Types
Unified scheduling of common Hadoop tasks such as MapReduce, Java MR, Streaming MR, Pig, Hive, Sqoop, Spark, and Shell.
Visual Workflow Definition
Configuration is complex; dependencies, time triggers, and event triggers are expressed using XML.
Task Monitoring
Provides task status, type, execution host, creation time, start time, and completion time.
Pause/Resume/Backfill
Supports start, stop, pause, resume, and re‑run operations.
Other
Can use a database for HA. Scheduling may encounter deadlocks depending on cluster version compatibility.
Azkaban
Azkaban, released by LinkedIn, is a batch workflow scheduler that runs a set of jobs in a defined order. Dependencies are defined via key:value pairs and must be acyclic; a web UI assists in maintenance and tracking.
Supported Types
Supports command, HadoopShell, Java, HadoopJava, Pig, Hive, and extensible plugins.
Typical Use Case
Illustrates a DAG where tasks A and B run independently, C depends on A and B, and D depends on C, highlighting the need for a scheduler to automate such flows.
Visual Workflow Definition
Jobs are defined via configuration files; a custom DSL can draw DAGs for upload.
Task Monitoring
Only basic task status is visible.
Pause/Resume/Backfill
Requires killing the workflow and restarting it.
Other
Supports HA via a database, but may become unresponsive with many tasks.
Airflow
Airflow is an open‑source scheduler written in Python, originated at Airbnb, open‑sourced in 2015 and later incubated by Apache.
Supported Types
Supports Python, Bash, HTTP, MySQL, and custom Operators.
Visual Workflow Definition
Workflows are defined programmatically using Python code.
Task Monitoring
Monitoring UI is not very intuitive.
Pause/Resume/Backfill
Tasks are killed and restarted manually.
Other
Heavy task loads can cause the system to become unresponsive.
XXL‑Job
XXL‑Job is an open‑source, lightweight distributed job scheduling platform with rich task management features, high performance, and high availability.
Supported Types
Java‑based tasks.
Visual Workflow Definition
No built‑in visual editor, but task dependencies can be configured.
Task Monitoring
No dedicated monitoring UI.
Pause/Resume/Backfill
Supports pause and resume operations.
Other
Supports HA; tasks are queued and processed via a polling mechanism.
DolphinScheduler
DolphinScheduler, open‑sourced by a Chinese company in 2019, became an Apache incubator project after unanimous voting.
It is a distributed, decentralized, extensible visual DAG workflow scheduler aimed at simplifying complex data processing dependencies.
Supported Types
Supports traditional shell tasks and big‑data platform tasks such as MR, Spark, SQL (MySQL, PostgreSQL, Hive/SparkSQL), Python, procedures, and sub‑processes.
Visual Workflow Definition
All flows and schedules are visual; users can drag‑and‑drop DAGs, configure data sources, and use APIs for third‑party systems.
Task Monitoring
Shows task status, type, retry count, execution host, visual variables, and execution logs.
Pause/Resume/Backfill
Supports pause, resume, and backfill operations.
Other
Provides HA with multi‑master and multi‑worker architecture, tenant‑based resource isolation, and a task queue that buffers excess tasks to avoid server overload.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
