Data Consistency Strategies in Microservices: Transaction Management and Patterns
This article reviews the evolution from traditional local and distributed transactions to BASE theory and presents four microservice data‑consistency patterns—reliable event notification, maximum‑effort notification, business compensation, and TCC—detailing their principles, advantages, drawbacks, and implementation examples.
1. Transaction Management in Traditional Applications
1.1 Local Transaction
Before discussing microservice data consistency, a brief background on transactions is introduced. Traditional monolithic applications use a single RDBMS as the data source. The application starts a transaction, performs CRUD operations, and commits or rolls back, all within a local transaction managed directly by the resource manager (RM). Data consistency is guaranteed inside this local transaction.
1.2 Distributed Transaction
1.2.1 Two‑Phase Commit (2PC)
When an application expands to use multiple data sources, a single local transaction can no longer guarantee consistency. Distributed transactions are introduced, with the most popular implementation being the two‑phase commit (2PC) managed by a transaction manager (TM).
2PC consists of a prepare phase and a commit phase.
Commit stage illustration.
Rollback stage illustration.
Although 2PC cannot fully guarantee consistency and suffers from synchronous blocking, its optimized version, three‑phase commit (3PC), was later invented.
1.2.2 Three‑Phase Commit (3PC)
3PC can guarantee consistency in most cases, but it is still not a focus of this article.
2. Transaction Management in Microservices
Distributed transactions such as 2PC or 3PC are unsuitable for microservices for three reasons:
Microservices communicate via RPC (e.g., Dubbo) or HTTP APIs, making it impossible for a transaction manager to directly manage the resource managers of each service.
Different services may use heterogeneous data stores, including NoSQL databases that do not support transactions.
Even if all data sources support transactions, a single large transaction spanning many services would hold locks for a much longer time, severely degrading performance.
Therefore, traditional distributed transactions cannot meet microservice requirements, and microservice transaction management must follow the BASE theory.
BASE (Basically Available, Soft state, Eventual consistency) was proposed by eBay architect Dan Pritchett as an extension of CAP, emphasizing eventual consistency when strong consistency is infeasible.
Basically Available : The system tolerates partial loss of availability during failures, ensuring core services remain up.
Soft state : The system may exist in intermediate states that do not affect overall availability; for example, multiple replicas may be out‑of‑sync temporarily.
Eventual consistency : All replicas converge to the same state after a bounded period; it is a special case of weak consistency.
In microservices, eventual consistency is the fundamental requirement. To achieve it, two major categories of solutions exist: notification‑based and compensation‑based. Notification‑based approaches further split into reliable event notification and maximum‑effort notification, while compensation‑based approaches include the TCC (Try‑Confirm‑Cancel) pattern and generic business compensation.
3. Ways to Achieve Data Consistency in Microservices
3.1 Reliable Event Notification Pattern
3.1.1 Synchronous Event
The simplest design is a synchronous event: the primary service performs its business logic, then immediately sends a message (usually via a message queue) to the secondary service. The code example below illustrates the flow.
public void trans() {
try {
// 1. Operate database
bool result = dao.update(data); // throws on failure
// 2. If DB operation succeeds, send message
if (result) {
mq.send(data); // may throw
}
} catch (Exception e) {
roolback(); // rollback on any exception
}
}While this looks flawless, two drawbacks exist:
If a network or server crash occurs after the message is sent but before the primary service receives acknowledgment, the primary service may roll back while the secondary service has already consumed the message, causing inconsistency.
The event service becomes tightly coupled with business logic; if the message service is unavailable, the whole business becomes unavailable.
3.1.2 Asynchronous Event
3.1.2.1 Local Event Service
To solve the problems of synchronous events, an asynchronous design decouples the business service from the event service. The business service writes the event to a local event table within the same transaction and attempts to deliver it. If delivery succeeds, the event row is removed; otherwise, a background event service retries until success.
Asynchronous Event Notification – Local Event Service
This approach still incurs some coupling during the first delivery attempt and adds extra load to the database because each business operation also writes to the event table.
3.1.2.2 External Event Service
External event services further isolate the event system from the business service. The business service records the event first, then after the transaction commits (or rolls back) notifies the event service, which finally sends or discards the event. The event service periodically polls for unsent events and queries the business service for their status.
Asynchronous Event Notification – External Event Service
Although this fully decouples the two sides, it introduces two extra network hops and requires the business service to expose a query interface for the event service.
3.1.2.3 Precautions for Reliable Event Notification
Two key concerns are correct event delivery and duplicate consumption. Idempotency ( 幂等性 ) must be ensured on the consumer side. For idempotent state‑change events (e.g., order status), timestamps or global sequence numbers can be used to discard stale messages. For non‑idempotent actions (e.g., monetary transfers), the consumer should persist the event ID and result, checking before processing.
3.2 Maximum‑Effort Notification Pattern
This simpler pattern retries sending a message a limited number of times (e.g., three) after the transaction commits. If all attempts fail, the message is dropped, and the upstream service must provide a query interface for downstream services to recover missing messages. This approach has low real‑time guarantees and is suitable only for scenarios where occasional loss is acceptable.
3.3 Business Compensation Pattern
In compensation patterns, the upstream service depends on the downstream result. When a downstream failure occurs, upstream services execute compensating actions (e.g., cancel a previously booked train ticket). Compensation is usually only partially reversible, leaving a trace (e.g., a “canceled” flag) in the database.
3.4 TCC (Try‑Confirm‑Cancel) Pattern
TCC is an optimized compensation pattern that can achieve full compensation without leaving residual records. It consists of two phases: Try (resource reservation and business checks) and Confirm/Cancel. Only if all Try phases succeed does the system proceed to Confirm; otherwise, Cancel releases the reserved resources.
TCC Pattern
Example: transferring 100 CNY from Bank A to Bank B. Service A reserves 100 CNY (freeze) in the Try phase; Service B reserves the same amount. If both Try phases succeed, Confirm moves the frozen amount to the balance; otherwise, Cancel releases the freeze.
try: update cmb_account set balance=balance-100, freeze=freeze+100 where acc_id=1 and balance>100;
confirm: update cmb_account set freeze=freeze-100 where acc_id=1;
cancel: update cmb_account set balance=balance+100, freeze=freeze-100 where acc_id=1; try: update cgb_account set freeze=freeze+100 where acc_id=1;
confirm: update cgb_account set balance=balance+100, freeze=freeze-100 where acc_id=1;
cancel: update cgb_account set freeze=freeze-100 where acc_id=1;3.5 Summary
The table below compares the four common patterns in terms of real‑time consistency, development cost, and whether the upstream service depends on the downstream result.
Type
Name
Real‑time Consistency
Development Cost
Upstream Depends on Downstream
Notification
Maximum Effort
Low
Low
No
Notification
Reliable Event
High
High
No
Compensation
Business Compensation
Low
Low
Yes
Compensation
TCC
High
High
Yes
Source: https://www.jianshu.com/p/b264a196b177
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.