Designing a Flexible Workflow Engine: From Simple Chains to Complex Nested Nodes
The article walks through the step‑by‑step evolution of a custom workflow engine, starting with a basic linked‑list approver chain and progressively adding support for parallel, multi‑sign, conditional, timed, proxy, cancellation, pre‑/post‑conditions, progress metrics, and script hooks, illustrating a comprehensive backend design.
Level 1
One day my boss asked me to build a simple workflow engine. After researching what a workflow is, I produced the following version:
Append any number of approvers in order to form a linked list, ending with a terminal node.
Record the current approver; after approval, move the approver pointer one step forward.
When the approver reaches the terminal node, the workflow ends.
Boss: It looks a bit crude.
Level 2
The boss returned asking for support of a multi‑sign node.
A multi‑sign node is a large node that contains many approvers; the workflow can proceed to the next node only after all approvers inside the node have approved.
I spent a week revising the original linked‑list design:
Structural adjustments:
Divide nodes into two major categories: simple nodes (rectangles) and complex nodes (circles).
Represent the whole process as a tree, where leaf nodes are simple nodes.
Each simple node contains exactly one approver.
Complex nodes contain several child nodes.
Introduce a multi‑sign node: once activated, all child nodes can be approved, and the multi‑sign node completes when all its children are approved.
Add a serial node: child nodes must be approved sequentially from left to right; the serial node completes after the last child is approved.
All workflows have an outermost serial node; when it completes, the entire workflow is finished.
Define node states to control the approval flow: Ready: a simple node that can be approved. Complete: a node that has been approved. Future: a node that has not been reached yet. Waiting: a state only for complex nodes, indicating they are waiting for child approvals.
An example approval process with a multi‑sign node is shown below:
Boss: Interesting.
Level 3
The boss now wants parallel nodes.
A parallel node is a large node containing many approvers; the node is considered complete as soon as any one approver approves.
Implementation details:
A parallel node is a complex node; when activated, any child node can be approved, and the parallel node completes when any child reaches the Complete state.
Introduce a new state Skip : when a child of a parallel node is in a non‑Ready/Waiting state, all its sibling nodes and their descendants are set to Skip.
Example illustration:
Boss: Adding new node types is quite convenient.
Level 4
The boss asks for nested nodes, e.g., a multi‑sign node containing a parallel node, which in turn contains a complex node, with unlimited nesting depth.
The design already supports this:
An infinitely extensible tree structure can represent arbitrarily complex processes.
Boss: You’ve got something.
Level 5
The boss now wants conditional nodes.
The workflow carries a form; the next branch is chosen based on the form’s content.
After several days of thought, I added a conditional node:
A conditional node works like a parallel node, but only child nodes whose conditions are satisfied become eligible for approval.
Boss: Noted.
Level 6
The boss wants three types of approvers: fixed, read from the form, and derived from the initiator via a mapping function (e.g., getSupervisor("Qian") returns "Li").
I split simple nodes into three categories:
Type 1: Approver is hard‑coded.
Type 2: Approver is read from the form.
Type 3: Approver is calculated from the initiator using a mapping function.
Boss: Hmm.
Level 7
The boss asks whether a node can be rejected backward, i.e., reject to a previous approver.
Implemented rejection to the initiator, effectively restarting the workflow:
Only nodes in the Ready state have the right to reject, just like only Ready nodes can approve.
Boss: You’re being lazy.
Level 8
Now implement rejection to the immediate previous approver.
Because workflows can be infinitely nested, determining the previous approvers is complex; after much effort I finally added this capability:
Boss: Noted.
Level 9
The boss wants rejection to any arbitrary node.
Solution: repeatedly reject to the previous level until a Ready node that contains the target node is reached.
Boss: Okay.
Level 10
The boss adds a time‑limit to ordinary nodes; if a node is not completed within the specified time, it is marked as timed out.
Implemented accordingly:
Realisation: the more requirements, the fewer hairs I have.
Level 11
The boss wants a delegation feature: if an approver is unsure, they can forward the task to a more suitable person.
Delegation differs from previous requirements because the node relationship must be mutable during execution.
Solution:
Create a new parallel node as the parent of the current node, and add a sibling node for the delegatee; both the original approver and the delegatee can approve.
Delegation can be nested indefinitely; a delegatee can further delegate.
Level 12
The boss now asks for a cancel‑delegation feature.
Implementation:
Cancel delegation is the inverse operation of delegation.
If the delegatee has already approved, delegation cannot be cancelled.
Level 13
The boss wants pre‑ and post‑conditions for each node: a node can be entered only when its pre‑condition is satisfied, and it can be completed only when its post‑condition is satisfied.
Result: the code handling approval logic doubled in size.
Level 14
Because some workflows have become very complex and take a long time, the boss asks for a metric that shows the current approval progress as a percentage.
Solution: treat the workflow as a tree and compute the ratio of the distance from the leftmost node to the rightmost Ready node over the total distance from the leftmost to the rightmost node.
Level 15
The boss wants each node to have two executable scripts: one that runs when the node starts approval and another that runs after the node is approved.
Implemented the feature; the author notes that the workload has taken a toll.
Afterword
The boss, a Tsinghua graduate, sold this workflow system to several securities companies. The author moved on to other jobs, reflecting on the intense overtime and hoping fellow engineers stay healthy and financially secure.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.