Designing a Flexible Workflow Engine: From Simple Chains to Complex Nested Nodes
This article narrates the step‑by‑step evolution of a workflow engine, starting with a basic sequential approver list and progressively adding countersign, parallel, conditional, delegation, timeout, scripting, and nesting capabilities, illustrating a tree‑based architecture and node‑state management for robust backend process automation.
One day the boss asked the author to build a simple workflow engine. The first version was a linear chain of approvers ending with a terminal node, which the boss found too crude.
In the second stage the boss demanded support for countersign nodes, where a large node contains many approvers and all must approve before proceeding. The design shifted from a linked list to a tree structure with two major node types: simple nodes (leaf rectangles) and complex nodes (circles). Simple nodes hold a single approver, while complex nodes contain sub‑nodes. Serial nodes enforce left‑to‑right approval, and the whole workflow is wrapped in a top‑level serial node.
Node states were introduced: Ready (approvable simple node), Complete (already approved), Future (not yet reached), and Waiting (complex node awaiting its children).
Parallel nodes were added next. A parallel node is a complex node where any child can be approved, and once any child reaches Complete the parallel node completes. A new Skip state marks sibling nodes that are no longer reachable.
Support for arbitrary nesting was then demonstrated, allowing countersign nodes to contain parallel nodes, which in turn could contain other complex nodes, enabling unlimited depth.
Conditional nodes were introduced to branch the workflow based on form data, behaving like parallel nodes but only activating children whose conditions are satisfied.
The author then differentiated simple nodes into three categories: fixed approver, approver read from the form, and approver derived from the initiator via a mapping function (e.g., get_manager("Qian") ).
Rejection logic was expanded: first, rejection to the initiator (resetting the workflow), then rejection to the previous approver, and finally rejection to any arbitrary node by iteratively moving back until a Ready node containing the target is found.
A timeout feature was added to ordinary nodes, marking them as overdue if not completed within a specified period.
Delegation (proxy) was implemented by creating a parallel node as the parent of the original node and adding a sibling node for the delegate, allowing unlimited nested delegation.
Cancellation of delegation was added as the inverse operation, prohibited if the delegate had already approved.
Pre‑ and post‑conditions were attached to each node, requiring the pre‑condition to be met before entry and the post‑condition before completion.
A progress metric was defined: the percentage of workflow completion is calculated as the distance from the leftmost node to the rightmost Ready node divided by the total distance to the rightmost node.
Finally, the system supports attaching two executable scripts to each node—one that runs when the node starts approval and another when it finishes.
The completed workflow engine was eventually sold to several securities firms, and the author reflects on the intense development experience.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.