A Generic State Machine Solution for Managing Business Entity Lifecycles

This article presents a comprehensive state‑machine‑based approach for managing the lifecycle of business entities such as orders and work orders, detailing core pain points, essential questions a state machine must answer, a comparative analysis of four implementation options, and a recommended solution that combines a database transition table, domain services, and optimistic‑lock concurrency control, along with architecture diagrams, code snippets, and operational guidelines.

Architect-Kip
Architect-Kip
Architect-Kip
A Generic State Machine Solution for Managing Business Entity Lifecycles

Problem Definition

Business entities such as orders, work orders, and approval documents suffer from illegal state transitions, scattered state logic, inconsistent states, difficult extensions, lack of audit, and concurrency conflicts when state changes are not centrally controlled.

Core insight: State fields are ordinary database columns without first‑class status management.

What a State Machine Must Answer

Legal transition definition

Guard conditions

Actions and side effects

Atomicity of state change + side effects

Concurrency safety

Observability

Solution Options

Four approaches were compared:

Spring Statemachine – high implementation complexity, medium distributed friendliness.

Database + domain service – low complexity, low transformation cost, optimistic‑lock concurrency safety, good audit support.

Event Sourcing – very high complexity, natural concurrency support, built‑in audit.

Camunda workflow engine – high complexity, good distributed support.

The recommended solution is the second option: a transition table in the database, a domain service, and optimistic‑lock CAS updates.

Overall Architecture

Three layers:

Business call layer : services invoke StateMachineService.fire() to trigger state changes.

State machine engine layer : validates transitions, executes guards, performs CAS updates, and dispatches side effects.

Data storage layer : stores definition, transition rules, and audit logs.

Key tables:

state_machine_definition
state_transition
state_change_log

Business entity table with added columns state, state_version,

state_updated_at

Core Engine Flow (fire())

Optimistic‑lock query the entity by ID and expected version.

Load transition rules for the current state and event.

Evaluate guard conditions in priority order.

Perform CAS update of state and version.

Insert an audit log record.

Dispatch side effects (MQ, local message table, notifications).

Return a StateChangeResult containing old state, new state, event, and new version.

Guard Mechanism

Three guard types are supported:

BEAN – Spring bean reference, debuggable and unit‑testable.

SPEL – SpEL expression evaluated at runtime, flexible but harder to debug.

NONE – No guard, any transition allowed.

Recommended hybrid strategy: use BEAN for complex checks, SPEL for simple conditions, and whitelist functions to avoid arbitrary code execution.

Concurrency Safety

The state_version field provides optimistic‑lock protection. If the CAS update affects zero rows, a ConcurrentConflictException is thrown. Callers may retry up to three times before surfacing an error.

Side‑Effect Strategies

Three consistency models are compared:

Strong (synchronous) – state change and side effect occur in the same transaction; highest reliability but long transactions.

Eventual (asynchronous MQ) – side effects are sent after transaction commit; low latency, MQ retry handles failures.

Reliable event (local outbox) – state and message are persisted together; a scheduler scans and delivers, providing exactly‑once delivery.

Key principle: side‑effect failure does not roll back the state unless strong consistency is required.

Timeout Mechanism

Automatic state timeout (e.g., cancel after 30 minutes) can be implemented by:

RocketMQ delayed messages – second‑level precision, no scanning overhead.

XXL‑Job scheduled tasks – minute‑level tolerance, simple but incurs scan delay.

In‑process time wheel (HashedWheelTimer) – low latency for single‑node scenarios, not suitable for distributed persistence.

Recommended primary use of RocketMQ delayed messages with XXL‑Job as a fallback for missed deliveries.

Rollback and Compensation

Rollback is treated as another state transition using the same fire() flow. When side effects have already executed (e.g., payment deducted), a compensation transaction is performed – for example, creating a refund record instead of trying to undo the original payment.

Integration Guide

Steps to onboard a new entity:

Define a state enum for the entity.

Insert a record into state_machine_definition.

Configure transition rules in state_transition.

Make the entity implement the StatefulEntity interface.

Replace direct state updates with calls to StateMachineService.fire().

For existing entities, add the three state columns, migrate current status values, configure rules, replace direct updates, and optionally add a DAO interceptor to forbid raw UPDATE state statements.

Observability & Operations

Audit queries retrieve the full change history of an entity. Example SQL snippets are provided for full logs and state‑stay‑time analysis.

Key monitoring metrics include: state.fire.success.count – successful transitions. state.fire.fail.count – failed transitions (alert > 100/min). state.fire.illegal.count – illegal transition attempts (alert > 10/min). state.fire.concurrent.count – optimistic‑lock conflicts (alert > 50/min). state.fire.duration.ms – P99 latency (alert if > 500 ms). state.outbox.pending.count – pending local messages (alert > 1000).

Risks & Mitigations

Incorrect transition configuration → add admin rule pre‑check feature.

Guard exceptions causing transaction rollback → guard implementations catch all exceptions and return false with a logged reason.

Side‑effect consumption failure → use local outbox with retry, alert, and manual compensation.

Frequent optimistic‑lock conflicts → automatic retry (max 3) before reporting error.

Business code bypassing fire() → DAO layer interceptor, coding standards, and code review enforcement.

State machine generic solution architecture overview
State machine generic solution architecture overview
State machine core data model ER diagram
State machine core data model ER diagram
StateMachineService.fire() core flow diagram
StateMachineService.fire() core flow diagram
Optimistic lock concurrency sequence diagram
Optimistic lock concurrency sequence diagram
Side‑effect consistency models
Side‑effect consistency models
State machine example for order lifecycle
State machine example for order lifecycle
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ConcurrencyState Machinelifecycle managementoptimistic lockside effectsaudit logdomain service
Architect-Kip
Written by

Architect-Kip

Daily architecture work and learning summaries. Not seeking lengthy articles—only real practical experience.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.