Data Consistency in Microservices: Transaction Management and Implementation Patterns
This article introduces the limitations of traditional local and distributed transactions for microservices, explains the BASE theory, and details four practical patterns—reliable event notification, maximum‑effort notification, business compensation, and TCC—providing code examples, diagrams, and a comparative table to guide developers in achieving eventual consistency across microservice architectures.
1. Traditional Application Transaction Management
1.1 Local Transaction
Before discussing microservice data consistency, a brief overview of transaction basics is given. In monolithic applications a single RDBMS provides local transactions where CRUD operations are committed or rolled back within the same resource manager, guaranteeing data consistency.
1.2 Distributed Transaction
1.2.1 Two‑Phase Commit (2PC)
When an application accesses multiple data sources, local transactions are insufficient and distributed transactions become necessary. The most common implementation is the two‑phase commit (2PC), coordinated by a transaction manager (TM) that first prepares all resources and then commits them.
2PC consists of a prepare phase and a commit phase.
Commit illustration
Rollback illustration
Although widely used, 2PC cannot fully guarantee consistency and suffers from blocking, which led to the invention of the three‑phase commit (3PC).
1.2.2 Three‑Phase Commit (3PC)
3PC improves on 2PC but still only guarantees consistency in most cases; detailed protocols are omitted as they are not the focus of this article.
2. Transaction Management in Microservices
Distributed 2PC/3PC are unsuitable for microservices for three main reasons:
Microservices communicate via RPC or HTTP APIs, preventing a single TM from managing all resources.
Different services may use heterogeneous data stores, some of which (e.g., NoSQL) lack transaction support.
Coordinating a large, cross‑service transaction dramatically increases lock duration and harms performance.
Consequently, microservices must adopt the BASE theory (Basically Available, Soft state, Eventual consistency) proposed by eBay architect Dan Pritchett.
Basically Available : the system tolerates partial loss of availability during failures while keeping core services alive.
Soft State : intermediate states are allowed and do not affect overall availability; replicas may be out‑of‑sync temporarily.
Eventual Consistency : all replicas converge to the same state after some time, providing a weaker but acceptable consistency model for microservices.
Achieving eventual consistency in microservices can be done via two broad categories of patterns: event‑notification and compensation, each with sub‑patterns.
3. Implementing Data Consistency in Microservices
3.1 Reliable Event Notification Pattern
3.1.1 Synchronous Event
The simplest approach is to send a message synchronously after the primary service completes its work. The following Java‑like code illustrates the logic:
public void trans() {
try {
// 1. Operate database
bool result = dao.update(data); // throws on failure
// 2. If DB succeeded, send message
if (result) {
mq.send(data); // throws on failure
}
} catch (Exception e) {
roolback(); // rollback on any exception
}
}While seemingly flawless, synchronous notification suffers from two drawbacks:
Network or server failures after the message is sent can cause the primary service to think the notification failed, leading to inconsistency.
The messaging service becomes tightly coupled with business logic; if the message broker is unavailable, the whole business flow is blocked.
3.1.2 Asynchronous Event
3.1.2.1 Local Event Service
To address the issues of synchronous events, an asynchronous model introduces a separate event service. The business service writes events to a local event table within the same transaction; a background worker retries delivery until successful.
Although reliable, this approach still incurs extra DB load and partial coupling.
3.1.2.2 External Event Service
The external event service further decouples the business and messaging layers. The business service records events without sending them; after the transaction commits (or rolls back), it notifies the event service, which then delivers or discards the events.
This adds two network hops and requires the business service to expose a query interface for the event service to check pending events.
3.1.2.3 Notes for Reliable Event Pattern
The pattern must ensure (1) correct delivery of events and (2) idempotent consumption. Idempotency can be achieved by making the event itself idempotent (e.g., order‑status updates) and using timestamps or global sequence numbers to discard stale messages, or by persisting event IDs and results to detect duplicates.
3.2 Maximum‑Effort Notification Pattern
In this simpler approach the business service attempts to send a message a limited number of times after committing its transaction. If all attempts fail, the message is lost and the downstream service must provide a query API for recovery. This pattern offers low development cost but weak real‑time guarantees.
3.3 Business Compensation Pattern
Here the upstream service proceeds normally, but if a downstream service fails, the upstream services perform compensating actions (e.g., cancel a previously booked ticket). Compensation is usually incomplete—records remain with a “canceled” flag—so the system retains an audit trail.
3.4 TCC (Try‑Confirm‑Cancel) Pattern
TCC refines compensation by providing a fully reversible workflow. In the Try phase each service reserves required resources; if all Try phases succeed, the Confirm phase finalizes the operation; otherwise the Cancel phase releases the reservations.
Example: a transfer from Bank A to Bank B.
Service A (debit):
try: update cmb_account set balance=balance-100, freeze=freeze+100 where acc_id=1 and balance>100;
confirm: update cmb_account set freeze=freeze-100 where acc_id=1;
cancel: update cmb_account set balance=balance+100, freeze=freeze-100 where acc_id=1;Service B (credit):
try: update cgb_account set freeze=freeze+100 where acc_id=1;
confirm: update cgb_account set balance=balance+100, freeze=freeze-100 where acc_id=1;
cancel: update cgb_account set freeze=freeze-100 where acc_id=1;The TCC workflow ensures atomicity without holding long‑lived locks, at the cost of implementing both Confirm and Cancel interfaces for each service.
3.5 Summary
The table below compares the four common patterns in terms of consistency latency, development cost, and whether the upstream service depends on downstream results.
Type
Name
Consistency Real‑time
Development Cost
Upstream Depends on Downstream
Notification
Maximum Effort
Low
Low
No
Notification
Reliable Event
High
High
No
Compensation
Business Compensation
Low
Low
Yes
Compensation
TCC
High
High
Yes
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.