Gear: An Internal Workflow Scheduling System for Hadoop at iQIYI
Gear is iQIYI’s internal, high‑availability workflow scheduler built on Apache Oozie and extended with a YAML‑based definition language, GitLab‑driven submission, and a web UI, enabling thousands of daily Hadoop/Spark jobs, complex dependencies, retries, and monitoring, and evolving from SSH‑centric 1.x to feature‑rich 2.x.
Gear is a workflow scheduling system developed by iQIYI Cloud Platform to manage timed tasks and workflows. Originally designed to solve Hadoop task scheduling, it now handles most Hadoop/Spark jobs and other unrelated timed tasks, aiming to become a universal internal platform for iQIYI.
The motivation for building Gear stems from the heavy load on Hadoop clusters at iQIYI, where about 150,000 jobs ran daily in 2015. The sheer number of jobs and their inter‑dependencies caused monitoring and operational burdens, and failures often required manual intervention.
Gear’s main functions include:
Job management via Web UI and Java SDK (start, pause, resume, stop, retry, view status, progress, logs).
Timed start using cron‑like expressions.
Dependency management: pre‑execution checks (e.g., HDFS files, HTTP URIs) and DAG‑based task dependencies.
Alarm mechanisms for workflow failures and SLA violations.
Retry mechanisms with configurable limits.
High availability across workflow engine, backend services, and execution agents.
Product interaction design and technology selection were based on a comparison of several open‑source workflow schedulers. After evaluating functionality, user experience, extensibility, community activity, and documentation, Apache Oozie was chosen as the underlying engine and then wrapped and extended.
Key design choices include:
Workflow definition language : a YAML‑based format replaces Oozie’s XML, offering concise, readable configurations and native support for DAG representation.
Code hosting and submission : Users push YAML files to GitLab; CI scripts automatically submit the workflow to Gear (Workflow‑as‑Code).
Workflow management : A Web UI allows users to view, run, pause, resume, stop, and retry workflows.
The architecture diagram (omitted here) illustrates the components.
Gear’s practice at iQIYI:
Gear 1.x (May 2016 – Aug 2017) focused on SSH actions to enable low‑cost migration of existing shell scripts. Improvements included linking Gear with YARN applications via environment variables and registration, adding metadata fields (owner, project, tags), and integrating with iQIYI’s alert platform.
Enhancements to the Oozie engine in Gear 1.x:
SSH task load balancing : Users can specify multiple hosts; Gear selects a host based on load.
Continue non‑dependent nodes after failure : If a node fails, other nodes without dependencies continue to run before the workflow is killed.
HTTP‑based pre‑dependency checks : Users can configure HTTP URIs with success keywords to extend dependency detection beyond HDFS/HCatalog.
Gear 2.x (Sep 2017 onward) added many new features:
Support for multiple task types (MapReduce, Hive, Spark, Shell, CheckFile, DistCp, Impala, Kylin, etc.) with YAML definitions.
Inter‑workflow dependencies allowing one scheduled workflow to depend on another.
Internal parallelism control to limit the number of concurrently running nodes.
Deep integration with Hadoop jobs: recording which Hadoop jobs belong to which workflow node and vice versa, enabling resource auditing.
Future plans for Gear include richer execution logic, more pre‑dependency options, cross‑cluster workflow switching, and priority scheduling.
Reference:
[1] Apache Oozie. https://oozie.apache.org/
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.