Data Consistency in Microservices: Transaction Management Patterns and Practices
The article reviews microservice data consistency challenges, explains why traditional distributed transactions like 2PC/3PC are unsuitable, introduces the BASE theory, and details four implementation patterns—reliable event notification, maximum effort notification, business compensation, and TCC—to achieve eventual consistency.
Recently I studied the characteristics of data consistency in microservices and summarized several current approaches for ensuring consistency, providing a high‑level overview without deep implementation details.
1. Transaction Management in Traditional Applications
1.1 Local Transactions
Traditional monolithic applications use a single RDBMS as the data source. The application starts a transaction, performs CRUD operations, and commits or rolls back, all within a local transaction managed directly by the resource manager (RM). Data consistency is guaranteed within this transaction.
1.2 Distributed Transactions
1.2.1 Two‑Phase Commit (2PC)
When an application expands to use multiple data sources, local transactions can no longer guarantee consistency. Distributed transactions, coordinated by a transaction manager (TM), become necessary. The most common protocol is Two‑Phase Commit (2PC), which consists of a prepare phase and a commit phase.
Commit and rollback diagrams follow.
2PC cannot fully guarantee consistency and suffers from synchronous blocking, leading to the invention of Three‑Phase Commit (3PC).
1.2.2 Three‑Phase Commit (3PC)
3PC improves on 2PC but still only guarantees consistency in most cases. Detailed discussions of 2PC/3PC are omitted as they are not the focus of this article.
2. Transaction Management in Microservices
Distributed transactions like 2PC or 3PC are unsuitable for microservices for three main reasons:
Microservices communicate via RPC or HTTP APIs, preventing a single TM from managing all resource managers.
Different services may use heterogeneous data stores, some of which (e.g., NoSQL) do not support transactions.
Even if all stores support transactions, a global transaction would span many services and last orders of magnitude longer than a local transaction, causing extensive locking and performance degradation.
Therefore, traditional distributed transactions cannot meet microservice needs, and the BASE theory becomes the guiding principle.
BASE, proposed by eBay architect Dan Pritchett, extends CAP and stands for Basically Available, Soft state, and Eventual Consistency.
Basically Available : The system tolerates partial loss of availability during failures, ensuring core services remain operational.
Soft state : The system may hold intermediate states that do not affect overall availability; replication delays exemplify this.
Eventual Consistency : All replicas converge to a consistent state after some time, representing a special case of weak consistency.
Eventual consistency is the fundamental requirement for microservice transaction management. Four major patterns can achieve it, divided into notification‑based and compensation‑based approaches.
3. Methods to Achieve Data Consistency in Microservices
3.1 Reliable Event Notification Pattern
3.1.1 Synchronous Events
The simplest form sends a message to downstream services synchronously after the primary service completes its business logic. The following code illustrates the flow:
public void trans() {
try {
// 1. Operate database
bool result = dao.update(data); // throws on failure
// 2. If DB operation succeeds, send message
if (result) {
mq.send(data); // throws on failure
}
} catch (Exception e) {
rollback(); // rollback on any exception
}
}While seemingly flawless, synchronous notification has two drawbacks:
Network or server failures after message delivery can cause the primary service to think the message failed, leading to inconsistency.
The message service becomes tightly coupled with business logic; if the message service is unavailable, the entire business flow is blocked.
3.1.2 Asynchronous Events
3.1.2.1 Local Event Service
To address the issues of synchronous events, a local event service records events in a local table within the same transaction. If sending succeeds, the event is removed; otherwise, a background service retries until successful.
Although this improves reliability, it still introduces coupling and additional DB load.
3.1.2.2 External Event Service
Externalizing the event service removes the coupling entirely. The business service records the event before commit, and after commit or rollback notifies the event service, which then sends or discards the event. The event service periodically checks for unsent events and queries the business service for status.
This approach adds extra network hops and requires the business service to expose a query interface.
3.1.2.3 Notes on Reliable Event Notification
Two key concerns are correct delivery and idempotent consumption. Idempotency can be ensured by using unique event IDs and persisting processing results, or by discarding stale events based on timestamps or global sequence numbers.
3.2 Maximum Effort Notification Pattern
Here the business service attempts to send a message a limited number of times after committing. If all attempts fail, the message is considered lost, and the downstream service must provide a query interface for recovery. This pattern is suitable for low‑criticality notifications (e.g., third‑party alerts) but not for strict consistency requirements.
3.3 Business Compensation Pattern
In this pure compensation model, upstream services perform normal commits, and if a downstream service fails, all upstream services execute compensating actions (e.g., canceling a previously booked train ticket). Compensation is typically partial —the original record remains with a “canceled” flag.
3.4 TCC (Try‑Confirm‑Cancel) Pattern
TCC is an optimized compensation approach that achieves full compensation. The workflow consists of:
Try: each service performs checks and reserves required resources.
If all Try phases succeed, Confirm executes the actual business logic using the reserved resources.
If any Try fails, Cancel releases the reserved resources.
Example: transferring 100 CNY from Bank A to Bank B.
Service A (debit):
try: update cmb_account set balance=balance-100, freeze=freeze+100 where acc_id=1 and balance>100;
confirm: update cmb_account set freeze=freeze-100 where acc_id=1;
cancel: update cmb_account set balance=balance+100, freeze=freeze-100 where acc_id=1;Service B (credit):
try: update cgb_account set freeze=freeze+100 where acc_id=1;
confirm: update cgb_account set balance=balance+100, freeze=freeze-100 where acc_id=1;
cancel: update cgb_account set freeze=freeze-100 where acc_id=1;The TCC flow ensures that either both accounts are updated or none, without leaving residual state.
3.5 Summary
The following table compares the four common patterns (reliable event notification, maximum effort notification, business compensation, and TCC) in terms of reliability, complexity, and consistency guarantees.
For further reading, see the original article at https://www.jianshu.com/p/b264a196b177 .
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.