Backend Development 11 min read

Design and Implementation of a DAG‑Based Task Orchestration Framework

This article explains how to design and implement a DAG‑based task orchestration framework in Java, covering graph representations, dependency management, executor integration, state tracking, and how to persist workflows and tasks in a relational database for platform‑level usage.

Architect

Aug 29, 2021

Design and Implementation of a DAG‑Based Task Orchestration Framework

Task orchestration means arranging atomic tasks in a custom order, possibly with dependencies, to form a workflow. The article illustrates a typical scenario where Task A and Task C run concurrently, Task B runs after A, and Task D runs after both B and C finish.

To model such workflows, a Directed Acyclic Graph (DAG) is used. The article explains basic graph concepts—vertices and edges—and compares adjacency matrix and adjacency list representations, noting that the matrix offers faster edge‑lookup while the list saves space.

Based on this knowledge, a simple Java framework is introduced. Core classes include DefaultDag and Node, which store nodes and their parent/child relationships:

public final class DefaultDag<T, R> implements Dag<T, R> {
    private Map<T, Node<T, R>> nodes = new HashMap<>();
    ...
}

public final class Node<T, R> {
    /** incoming dependencies for this node */
    private Set<Node<T, R>> parents = new LinkedHashSet<>();
    /** outgoing dependencies for this node */
    private Set<Node<T, R>> children = new LinkedHashSet<>();
    ...
}

Dependencies are added with methods such as addDependency, createNode, and addEdges:

public void addDependency(final T evalFirstNode, final T evalLaterNode) {
    Node<T, R> firstNode = createNode(evalFirstNode);
    Node<T, R> afterNode = createNode(evalLaterNode);
    addEdges(firstNode, afterNode);
}

private Node<T, R> createNode(final T value) {
    Node<T, R> node = new Node<>(value);
    return node;
}

private void addEdges(final Node<T, R> firstNode, final Node<T, R> afterNode) {
    if (!firstNode.equals(afterNode)) {
        firstNode.getChildren().add(afterNode);
        afterNode.getParents().add(firstNode);
    }
}

The execution engine combines the DAG with thread pools. Classes like DefaultDexecutor hold an ExecutorService for normal execution, an immediate‑retry executor, and a scheduled‑retry executor. DefaultExecutorState tracks processed and unprocessed nodes, error tasks, and results.

public class DefaultDexecutor<T, R> {
    private final ExecutorService<T, R> executionEngine;
    private final ExecutorService immediatelyRetryExecutor;
    private final ScheduledExecutorService scheduledRetryExecutor;
    private final ExecutorState<T, R> state;
    ...
}

public class DefaultExecutorState<T, R> {
    private final Dag<T, R> graph;
    private final Collection<Node<T, R>> processedNodes;
    private final Collection<Node<T, R>> unProcessedNodes;
    private final Collection<ExecutionResult<T, R>> erroredTasks;
    private final Collection<ExecutionResult<T, R>> executionResults;
    ...
}

Processing the DAG involves a breadth‑first traversal that submits ready nodes to the executor, waits for parent completion, records results, and continues with child nodes. The pseudo‑code shows the core loop:

private void doProcessNodes(final Set<Node<T, R>> nodes) {
    for (Node<T, R> node : nodes) {
        if (!processedNodes.contains(node) && processedNodes.containsAll(node.getParents())) {
            Task<T, R> task = newTask(node);
            executionEngine.submit(task);
            ExecutionResult<T, R> executionResult = executionEngine.processResult();
            if (executionResult.isSuccess()) {
                state.markProcessingDone(node);
            }
            doExecute(node.getChildren());
        }
    }
}

To turn the framework into a platform, the article proposes persisting workflow and task metadata in relational tables. A workflow table represents a whole DAG, while a task table stores each node’s ID, name, status, result, and a comma‑separated list of parent IDs.

task_id | workflow_id | task_name | task_status | result | task_parents
--------------------------------------------------------------------
1       | 1           | A         | 0           | NULL   | -1
2       | 1           | B         | 0           | NULL   | 1
3       | 1           | C         | 0           | NULL   | -1
4       | 1           | D         | 0           | NULL   | 2,3

SQL queries can retrieve initial tasks (those with task_parents = -1) and later discover child tasks using a LIKE pattern on the task_parents column. Retry logic is handled either by the framework’s immediate or scheduled executors, or manually by users through the platform UI.

Overall, the article demonstrates how to build a DAG‑driven task orchestration engine in Java and extend it into a full‑featured platform with persistence, visualisation, and manual retry capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java DAG workflow Task scheduling Executor

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.