Backend Development 11 min read

Design and Implementation of a DAG‑Based Task Scheduling Framework

This article explains how to build a task‑orchestration framework using directed acyclic graphs (DAG), covering graph representations, Java data structures, dependency management, concurrent execution with thread pools, and persisting workflow state to a relational database for platform‑level use.

IT Architects Alliance

Jul 13, 2021

Design and Implementation of a DAG‑Based Task Scheduling Framework

When a project requires a framework or platform that can orchestrate tasks, the author records the design ideas and implementation steps for a DAG‑based task scheduling system.

Task orchestration means arranging atomic tasks in a custom order where tasks may depend on each other; a typical workflow runs Task A and Task C concurrently, then Task B after A, and finally Task D after both B and C finish.

Because a workflow is naturally modeled as a directed acyclic graph (DAG), the article reviews graph basics, adjacency matrix and adjacency list representations, and chooses the matrix for fast edge lookup.

//Dagpublic final class DefaultDag<T, R> implements Dag<T, R> {
    private Map<T, Node<T, R>> nodes = new HashMap<T, Node<T, R>>();
    ...
}
//Nodepublic final class Node<T, R> {
    /** incoming dependencies for this node */
    private Set<Node<T, R>> parents = new LinkedHashSet<Node<T, R>>();
    /** outgoing dependencies for this node */
    private Set<Node<T, R>> children = new LinkedHashSet<Node<T, R>>();
    ...
}

Adding a dependency between two tasks is done by creating nodes and linking them:

public void addDependency(final T evalFirstNode, final T evalLaterNode) {
    Node<T, R> firstNode = createNode(evalFirstNode);
    Node<T, R> afterNode = createNode(evalLaterNode);
    addEdges(firstNode, afterNode);
}
private Node<T, R> createNode(final T value) {
    Node<T, R> node = new Node<T, R>(value);
    return node;
}
private void addEdges(final Node<T, R> firstNode, final Node<T, R> afterNode) {
    if (!firstNode.equals(afterNode)) {
        firstNode.getChildren().add(afterNode);
        afterNode.getParents().add(firstNode);
    }
}

The execution engine combines a normal thread pool with two retry pools, and an ExecutorState object records the DAG, processed and unprocessed nodes, errored tasks, and results:

//任务编排线程池
public class DefaultDexecutor<T, R> {
    //执行线程，和2种重试线程
    private final ExecutorService<T, R> executionEngine;
    private final ExecutorService immediatelyRetryExecutor;
    private final ScheduledExecutorService scheduledRetryExecutor;
    //执行状态
    private final ExecutorState<T, R> state;
    ...
}
//执行状态
public class DefaultExecutorState<T, R> {
    //底层图数据结构
    private final Dag<T, R> graph;
    //已完成
    private final Collection<Node<T, R>> processedNodes;
    //未完成
    private final Collection<Node<T, R>> unProcessedNodes;
    //错误task
    private final Collection<ExecutionResult<T, R>> erroredTasks;
    //执行结果
    private final Collection<ExecutionResult<T, R>> executionResults;
}

Execution proceeds by breadth‑first traversal of the DAG, submitting ready nodes to the thread pool and using a shared‑variable check to wait for all parent tasks before running a child (e.g., Task D waits for B and C).

private void doProcessNodes(final Set<Node<T, R>> nodes) {
    for (Node<T, R> node : nodes) {
        //共享变量 并发等待
        if (!processedNodes.contains(node) && processedNodes.containsAll(node.getParents())) {
            Task<T, R> task = newTask(node);
            this.executionEngine.submit(task);
            ...
            ExecutionResult<T, R> executionResult = this.executionEngine.processResult();
            if (executionResult.isSuccess()) {
                state.markProcessingDone(processedNode);
            }
            //继续执行孩子节点
            doExecute(processedNode.getChildren());
            ...
        }
    }
}

To turn the framework into a platform, the DAG is persisted in a relational database. A workflow table represents a workflow, and a task table stores each node with fields such as task_id, workflow_id, task_name, task_status, result, and task_parents (a comma‑separated list of parent IDs).

task_id
workflow_id
task_name
task_status
result
task_parents

Example data for the earlier diagram:

task_id  workflow_id  task_name  task_status  result  task_parents
1        1           A          0           -1
2        1           B          0           1
3        1           C          0           -1
4        1           D          0           2,3

Execution queries retrieve root tasks (where task_parents = -1) and submit them to the thread pool. Child tasks are fetched with a LIKE query on task_parents. A count query ensures all parent tasks have succeeded before a child runs.

Retry logic differs between the framework (automatic immediate or scheduled retries) and the platform (manual retries triggered by the user interface).

In summary, the article presents a complete backend solution for DAG‑based task orchestration, from in‑memory graph structures and concurrent execution to persistent storage and platform‑level features such as visual editing and manual retry.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java DAG workflow Task scheduling

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.