How to Ensure Data Consistency in Microservices: From Blocking Retries to TCC

This article examines common techniques for handling service call failures in micro‑service architectures—blocking retries, asynchronous queues, TCC compensation transactions, local message tables, and MQ transactions—detailing their implementations, pitfalls, and trade‑offs to achieve reliable data consistency.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
How to Ensure Data Consistency in Microservices: From Blocking Retries to TCC

Introduction

In distributed micro‑service architectures, inter‑service calls can fail due to network glitches, service crashes, or timeouts. When a failure occurs after part of a business transaction has already been persisted, data consistency becomes a problem. The article surveys practical mechanisms that avoid the classic two‑phase commit (2PC) and three‑phase commit (3PC) protocols.

Common approaches

Blocking retry

Asynchronous queue processing

TCC (Try‑Confirm‑Cancel) compensation transaction

Local message table (also called asynchronous assurance)

Independent message service

Transactional message queues (e.g., RocketMQ)

Blocking retry

A straightforward technique is to retry the downstream call synchronously a fixed number of times. The following Go‑style pseudo‑code shows a three‑attempt retry:

m := db.Insert(sql)

err := request(BService, m)

func request(url string, body interface{}) {
    for i := 0; i < 3; i++ {
        result, err := http.Post(url, body)
        if err == nil {
            break
        }
        log.Print(err)
    }
}

Issues:

Idempotency: a successful call may be perceived as failed because of a timeout, causing duplicate records.

Dirty data: if the call never succeeds, the earlier DB insert remains, leaving inconsistent state.

Latency and load amplification: each retry adds latency and can increase downstream pressure.

Mitigations:

Make the downstream API idempotent (e.g., use a unique request ID).

Use background reconciliation jobs only as a last resort.

Accept the latency trade‑off when the business does not require strong consistency.

Asynchronous queue

Introducing a message queue decouples the producer from the consumer. The producer writes business data and then publishes a message:

m := db.Insert(sql)
err := mq.Publish("B-Service-topic", m)

If mq.Publish fails (network error, process crash), the situation is equivalent to a blocking retry: the DB commit succeeded but the message was not sent, leading to potential inconsistency.

TCC compensation transaction

TCC splits each remote operation into three explicit phases:

Try : reserve resources (e.g., check inventory, lock amount).

Confirm : commit the reservation after all participants have succeeded.

Cancel : roll back the reservation when any participant fails.

Example (Go‑like pseudo‑code) for a three‑service workflow:

m := db.Insert(sql)

aRes, aErr := A.Try(m)
 bRes, bErr := B.Try(m)
 cRes, cErr := C.Try(m)

if cErr != nil {
    A.Cancel()
    B.Cancel()
    C.Cancel()
} else {
    A.Confirm()
    B.Confirm()
    C.Confirm()
}

Challenges:

Empty release : a Cancel may be invoked when the Try actually succeeded but the failure response was caused by a network glitch, leaving the resource locked.

Message ordering : network delays can cause Cancel to arrive before Try, producing the same empty‑release problem.

Failure of Cancel/Confirm : these operations themselves may fail, requiring additional retries or manual intervention.

Local message table

Originally proposed by eBay, a local message table lives in the same database as business tables, allowing a single local transaction to guarantee atomicity of both business data and a message record.

Within a DB transaction, insert business data and a message row with status = 'try'.

If subsequent steps (e.g., MQ publish) succeed, delete the message row ( status = 'confirm').

If any step fails, keep the message row; an asynchronous worker repeatedly reads rows with status = 'try' and retries the downstream operation.

Sample implementation:

messageTx := tc.NewTransaction("order")
messageTxSql := tx.TryPlan("content")

m, err := db.InsertTx(sql, messageTxSql)
if err != nil { return err }

publishErr := mq.Publish("B-Service-topic", m)
if publishErr != nil {
    // publishing failed → keep message in 'try' state
    messageTx.Confirm() // mark as confirm so worker will retry
} else {
    // publishing succeeded → remove message
    messageTx.Cancel()
}

The message table typically has two statuses, try and confirm. A background consumer processes both states, guaranteeing eventual delivery even if the service crashes after the DB commit.

Independent message service

To avoid coupling the message table with each business database, the table can be extracted into a dedicated service. The workflow mirrors the local‑message‑table pattern, but the message is stored via a remote API (status prepare) before the business logic runs. After the business steps succeed, the service calls Confirm; otherwise it calls Cancel.

Because the message creation is a separate network call, a failure at this stage can leave an orphan “prepare” record. Consumers must therefore treat the prepare state as a candidate for cleanup or retry.

Transactional message queues

Some MQ implementations (e.g., RocketMQ) support transactional messages that follow the same prepare‑confirm‑cancel lifecycle. The producer sends a prepare message, executes local business logic, and then issues a commit (confirm) or rollback (cancel) to the broker. The broker guarantees that the message is delivered to consumers only after a successful commit.

Summary

All mechanisms aim to achieve eventual consistency when a single distributed transaction cannot be completed atomically.

TCC provides fine‑grained control at the service layer and does not depend on a particular database, but it requires every participant to implement three APIs and to handle complex failure scenarios. Mature frameworks such as Alibaba Seata (formerly Fescar) can reduce the implementation burden.

Local message table is simple to adopt, needs no extra services, and works well with both direct service calls and MQs. Its downside is the tight coupling of the message table with the business database.

Independent message service and transactional MQ decouple transaction state from business data, avoiding per‑service message tables, but they rely on MQs that support transactions and introduce additional latency.

Reference implementation (source code only, no promotional text): https://github.com/mushroomsir/tcc

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesData ConsistencyMessage Queuedistributed-transactiontccLocal Message Tableblocking retry
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.