Backend Development 11 min read

Design and Implementation of a DAG‑Based Task Orchestration Framework

This article explains how to design and implement a DAG‑based task orchestration framework in Java, covering graph representations, dependency management, executor integration, state tracking, and how to persist workflows and tasks in a relational database for platform‑level usage.

Architect
Architect
Architect
Design and Implementation of a DAG‑Based Task Orchestration Framework

Task orchestration means arranging atomic tasks in a custom order, possibly with dependencies, to form a workflow. The article illustrates a typical scenario where Task A and Task C run concurrently, Task B runs after A, and Task D runs after both B and C finish.

To model such workflows, a Directed Acyclic Graph (DAG) is used. The article explains basic graph concepts—vertices and edges—and compares adjacency matrix and adjacency list representations, noting that the matrix offers faster edge‑lookup while the list saves space.

Based on this knowledge, a simple Java framework is introduced. Core classes include DefaultDag and Node , which store nodes and their parent/child relationships:

public final class DefaultDag
implements Dag
{
    private Map
> nodes = new HashMap<>();
    ...
}

public final class Node
{
    /** incoming dependencies for this node */
    private Set
> parents = new LinkedHashSet<>();
    /** outgoing dependencies for this node */
    private Set
> children = new LinkedHashSet<>();
    ...
}

Dependencies are added with methods such as addDependency , createNode , and addEdges :

public void addDependency(final T evalFirstNode, final T evalLaterNode) {
    Node
firstNode = createNode(evalFirstNode);
    Node
afterNode = createNode(evalLaterNode);
    addEdges(firstNode, afterNode);
}

private Node
createNode(final T value) {
    Node
node = new Node<>(value);
    return node;
}

private void addEdges(final Node
firstNode, final Node
afterNode) {
    if (!firstNode.equals(afterNode)) {
        firstNode.getChildren().add(afterNode);
        afterNode.getParents().add(firstNode);
    }
}

The execution engine combines the DAG with thread pools. Classes like DefaultDexecutor hold an ExecutorService for normal execution, an immediate‑retry executor, and a scheduled‑retry executor. DefaultExecutorState tracks processed and unprocessed nodes, error tasks, and results.

public class DefaultDexecutor
{
    private final ExecutorService
executionEngine;
    private final ExecutorService immediatelyRetryExecutor;
    private final ScheduledExecutorService scheduledRetryExecutor;
    private final ExecutorState
state;
    ...
}

public class DefaultExecutorState
{
    private final Dag
graph;
    private final Collection
> processedNodes;
    private final Collection
> unProcessedNodes;
    private final Collection
> erroredTasks;
    private final Collection
> executionResults;
    ...
}

Processing the DAG involves a breadth‑first traversal that submits ready nodes to the executor, waits for parent completion, records results, and continues with child nodes. The pseudo‑code shows the core loop:

private void doProcessNodes(final Set
> nodes) {
    for (Node
node : nodes) {
        if (!processedNodes.contains(node) && processedNodes.containsAll(node.getParents())) {
            Task
task = newTask(node);
            executionEngine.submit(task);
            ExecutionResult
executionResult = executionEngine.processResult();
            if (executionResult.isSuccess()) {
                state.markProcessingDone(node);
            }
            doExecute(node.getChildren());
        }
    }
}

To turn the framework into a platform, the article proposes persisting workflow and task metadata in relational tables. A workflow table represents a whole DAG, while a task table stores each node’s ID, name, status, result, and a comma‑separated list of parent IDs.

task_id | workflow_id | task_name | task_status | result | task_parents
--------------------------------------------------------------------
1       | 1           | A         | 0           | NULL   | -1
2       | 1           | B         | 0           | NULL   | 1
3       | 1           | C         | 0           | NULL   | -1
4       | 1           | D         | 0           | NULL   | 2,3

SQL queries can retrieve initial tasks (those with task_parents = -1 ) and later discover child tasks using a LIKE pattern on the task_parents column. Retry logic is handled either by the framework’s immediate or scheduled executors, or manually by users through the platform UI.

Overall, the article demonstrates how to build a DAG‑driven task orchestration engine in Java and extend it into a full‑featured platform with persistence, visualisation, and manual retry capabilities.

BackendJavaDAGDatabaseworkflowtask schedulingExecutor
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.