Backend Development 8 min read

Handling Distributed Transactions in Large‑Scale Microservice Systems

The article examines the challenges of distributed transactions in large‑scale microservice architectures and compares three main approaches—traditional two‑phase commit, stateless queue‑based eventual consistency with event sourcing and CQRS, and transaction compensation techniques such as TCC—highlighting their trade‑offs and real‑world examples like Flipkart.

Architecture Digest
Architecture Digest
Architecture Digest
Handling Distributed Transactions in Large‑Scale Microservice Systems

Large‑scale transaction processing systems such as the back‑ends of major e‑commerce sites face unavoidable distributed transaction challenges, especially under microservice architectures. Strict data consistency is required for orders, payments, inventory, etc., and business logic often spans multiple services, making partial failures a critical concern.

For example, when services A and B participate in a single transaction, a crash or network error in service B raises the question of how service A should respond, and vice‑versa for recovery scenarios.

According to the book *Transactional Information Systems* by Weikum and Vossen, three main techniques address these issues:

First , introduce a distributed transaction protocol, most commonly the two‑phase commit (2PC).

Applying 2PC in SOA is generally discouraged because it contradicts service‑governance goals and suffers from availability issues, deadlocks, and performance overhead from locking. Implementations such as JBoss Narayana provide various transaction protocols (XATMI, JTA, JTS, Web‑Service Transactions, REST Transactions, etc.).

Second , use a "stateless queue transaction" approach: employ message queues and eventual consistency to decouple services. A service must atomically publish an event whenever its local database changes.

This method offers high performance and scalability but introduces challenges such as double‑writes (simultaneous writes to a database and a message queue), idempotency, and message ordering. Techniques like Change Data Capture (CDC) and log‑centric architectures can mitigate these issues. Flipkart, for instance, uses this pattern with an idempotency filter and a retry queue, avoiding transaction rollbacks.

Another solution combines Event Sourcing and CQRS . Event Sourcing persists only state‑change events to an Event Store, while CQRS separates read models from write models to provide query views.

CQRS complements Event Sourcing by delivering the necessary query side. The combined pattern can implement cross‑service distributed transactions; example code is available at https://github.com/cer/event-sourcing-examples .

Pat Helland’s influential paper "Life beyond Distributed Transactions: an Apostate's Opinion" advocates avoiding distributed transactions by introducing three abstract roles: Entity, Message, and Activity. Google’s Percolator implements this semantics, often using Paxos for Activity. Flipkart’s idempotency filter is a practical approximation of the Activity role.

Third , employ transaction compensation. Each forward operation must have a corresponding compensating (rollback) operation. Compensation can be stateless (implemented as a separate compensating transaction via a workflow engine) or stateful, exemplified by the TCC (Try‑Commit‑Cancel) pattern, which blends compensation with XA‑style coordination but suffers from poorer performance and scalability.

For a comprehensive overview of SOA‑style transaction handling, see the Alipay presentation by Cheng Li from eight years ago.

Copyright statement: Content sourced from the internet; original author retains rights. We strive to credit authors and sources; please inform us of any infringement.

microservicesCQRSdistributed transactionsEvent SourcingTwo-Phase Committransaction compensation
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.