How to Break Down Distributed Transactions for Reliable Microservices
This article explains the challenges of distributed consistency when a business operation writes to both MySQL and third‑party systems, presents a financial reimbursement case study, analyzes failure risks, and offers a practical solution that splits large transactions into small, retryable units using Spring and a task table.
Preface
When a business operation requires writing data to a local MySQL database and invoking a third‑party system, the process involves distributed consistency problems, and not every system can rely on mature distributed‑transaction solutions.
Sample code repository: https://gitee.com/dailycreatebug/demo-codes
Case Description
The example uses a financial reimbursement workflow that involves three systems:
Document System : Handles the submission of reimbursement requests and generates financial vouchers. Implemented in Java.
BPM System : A mature workflow engine (e.g., Fanwei) that drives the process. It provides two capabilities: API call for approval and API call to fetch the next approver.
SAP System : A financial system that receives voucher data from the Document System via API after financial approval.
"Approval Passed" Business Flow
Save business data and audit log.
Call BPM API to mark approval.
Call BPM API to get the next approver.
If no approver remains, generate the voucher and push it to SAP.
Code illustration:
Risk Analysis
If an exception occurs in steps 1 or 2, the MySQL writes are rolled back, leaving the system unchanged. However, if steps 1 and 2 succeed and step 3 fails, the MySQL transaction rolls back but the BPM operation does not, resulting in a state where the Document System shows no logs or data, yet BPM has already approved the process—a situation that should never happen in a correct workflow.
From the user’s perspective, an error is shown and they may retry the approval, but BPM may already have moved to the next node, causing permission errors.
Problem Analysis
The root cause is that a MySQL transaction can only control its own database operations and cannot guarantee the outcome of remote third‑party calls.
Solution Idea
Break the large transaction into smaller ones. The principles are:
Each small transaction should contain at most one remote write operation.
The remote write should be placed at the end of the method so that the transaction can be committed immediately after a successful return.
Because small transactions may fail, a retry mechanism is required.
Implementation steps:
1. Create a task table
CREATE TABLE `transaction_job` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`type` varchar(255) NOT NULL COMMENT 'task type',
`data` varchar(255) NOT NULL COMMENT 'task data',
`error_message` varchar(255) DEFAULT NULL COMMENT 'error message',
`context` varchar(255) DEFAULT NULL COMMENT 'task context (mainly the current operator)',
`create_time` bigint NOT NULL COMMENT 'creation time',
`update_time` bigint NOT NULL COMMENT 'update time',
`retry_times` int(11) NOT NULL DEFAULT '0' COMMENT 'retry count',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='transaction task table';The table stores the essential data of each small transaction and the operator context.
2. Scheduled job scans unfinished rows
The job queries transaction_job for pending tasks and executes them using a simple strategy‑pattern framework that separates framework code from business code.
Key steps:
Scan tasks.
Execute tasks.
Each small transaction implements a dedicated strategy class.
The implementation for updating the next approver simply copies the original code.
3. Refactor business code
Instead of completing the whole process in one transaction, the first small transaction inserts a row into transaction_job to trigger the second small transaction.
The insertion uses transactionJobService.create and stores the necessary data.
Optimization
Because subsequent small transactions are executed by a scheduled job, they introduce latency. By leveraging Spring’s transaction lifecycle, we can invoke the task insertion within the same transaction and use TransactionSynchronizationManager.afterCommit() (or Spring’s afterCommit()) to run the task immediately after the transaction commits, eliminating user‑visible delay.
If the small transaction succeeds, the task status is set to “success”, so the user experiences no delay.
Note
There is a risk that the newly inserted task is picked up immediately by the scheduled scanner, causing the same task to be executed concurrently by the main thread and the scanner thread. In production, you need to handle this, e.g., by adding a lock or re‑checking the database state before execution.
Conclusion
The article presents the core idea of splitting a large distributed transaction into multiple small, independently retryable transactions, storing their state in a task table, and using Spring’s after‑commit hook to reduce latency. Additional features and refinements will be added in future updates.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
