Saga Pattern for Distributed Data Consistency in Microservices

This article explains why data consistency is critical in microservice architectures, introduces the Saga pattern as a solution for achieving eventual consistency, compares it with two‑phase commit and TCC, and details the design of a centralized Saga framework including logging, recovery, and execution components.

Architecture Digest
Architecture Digest
Architecture Digest
Saga Pattern for Distributed Data Consistency in Microservices

Data consistency is a crucial concern when building business systems. Traditionally databases guarantee consistency, but in microservice and distributed environments achieving consistency becomes challenging. ServiceComb, an open‑source microservice framework, introduces the ServiceComb‑Saga project to address eventual consistency.

Data Consistency in a Monolithic Application

Imagine a travel‑planning platform that books flights, rental cars, and hotels in a single transaction. In a monolith, all three requests can be wrapped in one database transaction, ensuring either all succeed or all roll back.

The monolithic approach works well when the system is small.

Data Consistency in a Microservice Scenario

As the platform grows, the monolith is split into four independent services—flight booking, car rental, hotel booking, and payment—each with its own database and release cycle. The original business rule (all three bookings must succeed) can no longer be enforced by a single database transaction.

To satisfy the rule, a coordination mechanism is needed across service boundaries.

Sagas

A Saga is a long‑lived transaction composed of a series of sub‑transactions that can be interleaved. Each sub‑transaction is a real transaction that keeps its own database consistent.

Saga is a long‑lived transaction that can be broken down into a set of interleavable sub‑transactions, each of which is a real transaction that maintains database consistency.

In the travel‑planning example, the overall transaction becomes a Saga consisting of flight, car, hotel, and payment sub‑transactions.

How a Saga Works

Sagas consist of linked transactions that must be executed as a (non‑atomic) unit. Any partially executed Saga does not satisfy the requirement and must be compensated. Each sub‑transaction Ti must provide a corresponding compensating transaction Ci.

When all sub‑transactions succeed, the Saga completes; otherwise, the system executes compensating transactions in reverse order.

Sub‑transaction sequence T1, T2, …, Tn completes (best case).

Or sequence T1, T2, …, Tj, Cj, …, C2, C1 completes when a failure occurs at step j.

This guarantees that either all bookings are performed or all are cancelled, and that a failed payment also triggers cancellation of previous bookings.

Saga Recovery Methods

The original paper describes two recovery strategies:

Backward recovery: compensate all completed transactions if any sub‑transaction fails. Forward recovery: retry the failed transaction assuming each sub‑transaction will eventually succeed.

Forward recovery does not require compensating actions when sub‑transactions are guaranteed to succeed.

When to Use a Saga

Sagas allow only two levels of nesting (a top‑level Saga and simple sub‑transactions).

Full atomicity cannot be achieved; a Saga may see partial results of other Sagas.

Each sub‑transaction must be an independent atomic operation.

In the travel example, flight, car, hotel, and payment are naturally independent and can each be made atomic within its own service.

Compensating actions may not always revert the system to the exact pre‑transaction state, but they can achieve the required business effect (e.g., sending a follow‑up email to undo a previous email).

Saga Log

To survive crashes, a Saga persists events such as:

Saga started event – records the whole request.

Transaction started event – records each sub‑transaction request.

Transaction ended event – records the response.

Transaction aborted event – records failure reasons.

Transaction compensated event – records compensating request responses.

Saga ended event – marks the end of the Saga.

These events can be stored in SQL/NoSQL databases, message queues, or files, enabling the Saga to resume from any persisted state.

Saga Request Data Structure

The four services form a directed acyclic graph (DAG) where the root is the Saga start node and leaves are the end nodes. This graph representation fits the parallel execution model.

Parallel Saga

Flight, car, and hotel bookings can be executed in parallel, but if one fails while others are still processing, the system must handle timeouts and possible duplicate requests. Compensating requests must be commutative so that a later retry of a sub‑transaction does not interfere with an already issued compensation.

ACID and Saga

ACID provides atomicity, consistency, isolation, and durability. Sagas do not provide atomicity or isolation; they rely on compensating actions to achieve eventual consistency and durability.

Saga Architecture

The centralized Saga design includes:

Saga Execution Component – parses the JSON request and builds the DAG.

TaskRunner – ensures proper execution order via a task queue.

TaskConsumer – writes events to the Saga log and forwards requests to remote services.

Two‑Phase Commit (2PC)

Two‑phase commit is a distributed algorithm that coordinates participants to either all commit or all abort a transaction.

2PC involves a voting phase and a decision phase, but it blocks resources, does not scale well, and leaves participants in an uncertain state if the coordinator fails.

Try‑Confirm/Cancel (TCC)

TCC is another compensating‑transaction model with two phases: Try (reserve resources) and Confirm (finalize). It offers easier compensation design but requires extra “pending” states and double the communication latency compared to Sagas.

Event‑Driven Architecture

Similar to TCC, each service records a pending state and emits events to the next service. An additional event table tracks progress to handle crashes. This approach resembles a decentralized, event‑driven Saga.

Centralized vs. Decentralized Implementations

In a decentralized Saga, each service acts as the coordinator for the next service, handling compensation locally. While this gives services more autonomy, it also forces every service to embed the consistency protocol and persistence logic.

The centralized design isolates consistency concerns, simplifying service code and improving observability.

Summary

The article compares Saga with other data‑consistency solutions. Saga scales better than 2PC, requires less business‑logic change than TCC, and a centralized Saga design decouples services from consistency logic, making troubleshooting easier.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesEvent Sourcing
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.