How to Gracefully Implement Consistent Compensation Across Multiple Systems
The article examines the challenges of achieving distributed consistency when a business operation writes to MySQL and external systems, analyzes failure scenarios, and presents a step‑by‑step solution that splits the large transaction into small, retryable tasks with Spring afterCommit optimization.
Introduction When a business operation needs to write data to a local MySQL database and also invoke third‑party systems, the workflow becomes a distributed‑system consistency problem. Not all systems can use mature distributed‑transaction solutions.
Case Overview A financial reimbursement process involves three systems:
Document system – creates the reimbursement request and generates accounting vouchers (implemented in Java).
BPM system – manages the workflow; the case uses a commercial BPM product (e.g., Fanwei) to call APIs for approval and to fetch the next approver.
SAP system – the finance‑specific system that receives the voucher data via API after SAP approval.
Business Flow for "Approval Passed" The steps are:
Save business data and record the approval log.
Call the BPM API to mark approval.
Call the BPM API to obtain the latest pending approver.
If no pending approver exists, the process is complete; generate the voucher and push it to SAP.
Risk Analysis If an exception occurs in the first two steps, the MySQL transaction rolls back, leaving all data unchanged. However, if steps 1 and 2 succeed and step 3 fails, the MySQL transaction rolls back but the BPM operation does not, resulting in a state where the document system shows no logs while BPM shows the process already approved. Users see an error, may retry, and encounter permission issues because BPM has already moved to the next node.
Problem Diagnosis The root cause is that MySQL transactions can only control their own database; they cannot guarantee the outcome of remote system calls.
Solution Idea The large transaction must be split into smaller, independent transactions. The principles for splitting are:
Each small transaction contains at most one remote write operation.
The remote write is placed at the end of the method so that a successful return immediately commits the transaction.
Because small transactions may fail, a retry mechanism is required.
Implementation Steps
1. Create a task table to persist the essential data of each small transaction.
CREATE TABLE `transaction_job` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`type` varchar(255) NOT NULL COMMENT 'task type',
`data` varchar(255) NOT NULL COMMENT 'task data',
`error_message` varchar(255) DEFAULT NULL COMMENT 'error message',
`context` varchar(255) DEFAULT NULL COMMENT 'task context (mainly the current operator)',
`create_time` bigint(20) NOT NULL COMMENT 'creation time',
`update_time` bigint(20) NOT NULL COMMENT 'update time',
`retry_times` int(11) NOT NULL DEFAULT '0' COMMENT 'retry count',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='transaction task table';This table stores the key data needed to replay a small transaction and the operator information.
2. Use a scheduled job to scan transaction_job for unfinished records and execute the corresponding logic. The framework separates core scanning/execution logic from business-specific implementations via a strategy pattern.
Scanning tasks → Executing tasks (each small transaction is encapsulated in its own implementation class).
The concrete implementation for updating the pending approver simply copies the original code into a dedicated class.
3. Refactor the business code so that the first small transaction inserts a record into transaction_job before invoking the second remote operation.
The insertion code is encapsulated in transactionJobService.create:
4. Optimize latency by leveraging Spring's transaction lifecycle hook TransactionSynchronizationManager.afterCommit(). Adding this call after inserting the task ensures the task is marked successful immediately after the surrounding transaction commits, making the delay invisible to the user.
If the small transaction succeeds, the task status becomes "success" instantly, eliminating perceived latency.
Note In practice, the scheduled scanner may pick up the newly inserted task immediately, causing the main thread and the scheduler to execute the same task concurrently. Proper locking or a pre‑execution state check is required to avoid duplicate execution.
Conclusion The article presents the core idea of splitting a large distributed transaction into small, retryable units, persisting them in a task table, and using Spring's after‑commit hook to hide latency. Additional production‑level features (e.g., idempotency, detailed monitoring) are omitted but will be added later.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
