Databases 13 min read

Ensuring Consistency in Distributed Systems: From Local Transactions to Two‑Phase Commit and Compensation Mechanisms

This article examines various consistency solutions for distributed systems, including strong and eventual consistency, local database transactions, two‑phase commit, TCC, rollback mechanisms, local message tables, and compensation techniques, illustrating their trade‑offs and appropriate application scenarios.

Architecture Digest
Architecture Digest
Architecture Digest
Ensuring Consistency in Distributed Systems: From Local Transactions to Two‑Phase Commit and Compensation Mechanisms

Introduction

In ideal Internet systems we would like to satisfy consistency, availability, and partition tolerance simultaneously, but according to the CAP theorem and BASE theory this is impossible in practice. In the financial domain consistency is paramount and must always be met, while Internet services often sacrifice strong consistency for high availability, relying on eventual consistency within an acceptable time window.

Database Local Transaction

Database transactions provide strong consistency in the simplest way because the database guarantees it, relieving the business layer from handling details. A typical use case is a refund scenario with cash‑back: two related transactions (purchase and cash‑back) must be refunded together, which can be achieved with a single local database transaction.

Summary: Database transactions are simple and require little business‑layer involvement, but in highly available systems they can become complex, hindering scalability and maintainability.

Two‑Phase Commit (2PC)

Beyond local consistency, distributed systems need distributed transactions, commonly implemented with the two‑phase commit protocol. Baidu’s NesioDB, a distributed database used in core payment services, adopts 2PC to satisfy ACID properties across nodes.

Phase 1 – Prepare:

The transaction manager (DTM) sends a prepare message to all resource managers (RMs). Each RM creates and executes a local transaction, writes redo/undo logs, but does not commit.

Phase 2 – Commit/Rollback:

If any RM reports failure or times out, DTM issues a rollback; otherwise it issues a commit. After execution, the RM releases all locks.

Two‑phase commit can also be implemented as TCC (Try‑Confirm‑Cancel), a flexible transaction pattern used for transaction‑account separation. TCC involves three modules: the main service, a sub‑service, and an activity manager (the coordinator).

First Stage: The main service calls the try operation of all sub‑services and records them in the activity manager. If all try calls succeed (or any fails), the process moves to the second stage.

Second Stage: Based on the try results, the activity manager issues confirm or cancel commands. If all tries succeeded, confirm is called; otherwise cancel is called. Both confirm and cancel may fail, requiring additional error handling to preserve consistency.

Summary: TCC is relatively easy to implement and suits traditional monolithic applications with cross‑database operations.

Rollback Mechanism

In distributed architectures, when coordinating multiple atomic services (e.g., A and B), a failure in one service necessitates a rollback to maintain consistency. An example is a joint‑loan project where funds are first collected from merchants A and B into an intermediate merchant C, then transferred to user D’s bank card. If the second step fails, the collected funds must be rolled back to A and B.

The rollback mechanism provides a strong compensating action: if the “funds to card” step fails after a successful “funds collection,” the system reverses the collection, returning money to the original merchants.

Summary: Rollback is suitable only for simple scenarios with few services; in complex cases it leads to large codebases, high coupling, and high compensation cost.

Local Message Table

This pattern, inspired by eBay and popularized by companies like Alipay, splits a distributed transaction into a series of local transactions using a relational database table. It is applied in non‑core asynchronous refactoring projects for wallet systems.

Typical steps:

Core business writes transaction data to the primary transaction table.

Non‑core business asynchronously writes the same data to a user‑oriented query table.

Both tables must stay consistent. If a reliable message queue is unavailable, a local message table can achieve eventual consistency.

Workflow:

Service A writes business data and a local message to DB1 within a single transaction.

After committing, Service A sends the message to a message queue (MQ).

MQ delivers the message to Service B, which processes it and writes to DB2.

If MQ loses the message, Service C reads the pending messages from DB1 and re‑sends them to Service B via RPC.

Consumption can be verified by checking the transaction result in DB2.

Summary: This classic approach avoids distributed transactions and achieves eventual consistency, but relational databases may become a performance bottleneck under high concurrency.

Compensation Mechanism

Compensation is widely used in distributed systems, especially in wallet applications such as core cash registers and payment‑to‑card flows. When a downstream bank deduction succeeds but the upstream service crashes, a background “compensation” job (often called a “补单程序”) resumes the interrupted workflow.

Compensation is effective for high‑availability services where transient network failures or timeouts occur, allowing the system to recover and maintain consistency.

Conclusion

The article presents several concrete projects from a core system to illustrate how distributed systems achieve consistency. Each solution—local transactions, two‑phase commit, TCC, rollback, local message tables, and compensation—has distinct characteristics and suitable scenarios. No single method solves all cases; practitioners must choose based on specific business requirements.

distributed systemsMessage QueuecompensationTransactionsconsistencyTwo-Phase Commit
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.