Backend Development 18 min read

Understanding the Saga Pattern for Distributed Data Consistency in Microservices

This article explains why data consistency is critical in microservice architectures, introduces the Saga pattern and its execution and recovery mechanisms, compares it with two‑phase commit and TCC, and presents a centralized Saga design using ServiceComb for reliable distributed transactions.

Architect
Architect
Architect
Understanding the Saga Pattern for Distributed Data Consistency in Microservices

Data Consistency in Monolithic Applications

In a monolithic travel‑booking system, a single database transaction can guarantee that flight, car‑rental, and hotel reservations either all succeed or all roll back, ensuring a successful itinerary for the customer.

Data Consistency in Microservice Scenarios

When the same business is split into independent services, each with its own database, a single transaction can no longer guarantee consistency, and coordinating the four services (flight, car, hotel, payment) becomes a complex challenge.

Sagas

A Saga is a long‑running transaction composed of a series of compensatable sub‑transactions; each sub‑transaction is a regular database transaction, and if any step fails, previously completed steps are compensated.

"Saga is a long‑lived transaction that can be broken into interleavable sub‑transactions, each of which maintains database consistency."

In the travel‑planning example, the overall transaction consists of four sub‑transactions: flight booking, car rental, hotel reservation, and payment.

Saga Execution Principle

Sub‑transactions are linked; if a sub‑transaction fails, a corresponding compensating transaction must be executed to undo the work of completed steps.

The system defines each sub‑transaction and its compensation, guaranteeing either full completion or a compensated rollback.

Saga Recovery Methods

The original paper describes two recovery strategies: backward recovery (compensate all completed steps when a failure occurs) and forward recovery (retry the failed step assuming eventual success).

Conditions for Using Saga

Saga allows only two nesting levels (top‑level Saga and simple sub‑transactions).

Full atomicity cannot be achieved; sagas may see partial results of other sagas.

Each sub‑transaction must be an independent atomic operation.

In the travel scenario, flight, car, hotel, and payment are naturally independent and can each be made atomic within its own service.

Compensations may not always restore the exact prior state, but they can semantically undo the effect (e.g., sending a follow‑up email to cancel a previous one).

Saga Log

To recover from crashes, the Saga system persists events such as SagaStarted, TransactionStarted, TransactionEnded, TransactionAborted, TransactionCompensated, and SagaEnded, allowing the Saga to resume from any intermediate state.

Saga Request Data Structure

The travel‑planning request is modeled as a directed acyclic graph where the root node starts the Saga and leaf nodes mark its completion, enabling parallel execution of independent services while preserving the overall order.

Parallel Saga

Parallel execution of flight, car, and hotel bookings can lead to situations where one service fails while others are still processing; the design must ensure that compensations are idempotent and that duplicate requests after compensation are ignored.

ACID and Saga

Saga does not provide full ACID guarantees; atomicity and isolation are relaxed, but consistency and durability are achieved through the compensating logic and persistent saga log.

Saga Architecture

The centralized architecture includes a Saga Execution Component that parses JSON requests into a graph, a TaskRunner that enforces execution order, and a TaskConsumer that writes events to the saga log and invokes remote services.

Two‑Phase Commit (2PC)

Two‑Phase Commit is a distributed algorithm that coordinates participants to either all commit or all abort a transaction.

2PC involves a voting phase and a decision phase, but it blocks resources, incurs high latency, and does not scale well as the number of participants grows.

Try‑Confirm/Cancel (TCC)

TCC splits a transaction into a try phase (reserve resources), a confirm phase (finalize), and a cancel phase (release resources). It offers clearer compensation but requires additional service states and double the communication overhead compared to Saga.

Event‑Driven Architecture

Similar to TCC, each service records a pending state and emits events to the next service; the final service notifies the previous one upon completion, allowing the whole saga to be reconstructed from persisted events.

Centralized vs Decentralized Implementations

Centralized sagas use a dedicated coordinator, simplifying service logic, while decentralized sagas let each service act as the coordinator for the next step, offering autonomy at the cost of added protocol complexity.

Summary

The article compares Saga with 2PC and TCC, showing that Saga scales better than 2PC and requires less business‑logic change than TCC; a centralized Saga design decouples consistency concerns from services, simplifies troubleshooting, and improves performance for compensable transactions.

Microservicesdata consistencycompensation2PCtccdistributed transactionsSagaServiceComb
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.