Ensuring Microservice Data Consistency: Retry, Queues, TCC & Message Tables
The article examines common strategies for handling service call failures and maintaining data consistency in microservice architectures, comparing blocking retries, asynchronous queues, TCC compensation transactions, local message tables, and MQ transactions, highlighting their trade‑offs, implementation details, and practical considerations.
Introduction
In distributed microservice systems, inter‑service call failures are common. Simple retries are insufficient for guaranteeing data consistency, so additional mechanisms are required. This summary covers five approaches: blocking retry, asynchronous queue, TCC (Try‑Confirm‑Cancel) compensation transaction, local message table, and MQ transaction.
Blocking Retry
A service repeatedly calls a downstream API until it succeeds or a retry limit is reached.
m := db.Insert(sql)
err := request(BService, m)
func request(url string, body interface{}) {
for i := 0; i < 3; i++ {
result, err := request.POST(url, body)
if err == nil {
break
} else {
log.Print()
}
}
}Problems:
Duplicate processing if the downstream service actually succeeded but the caller perceives a failure.
Partial data remains when the downstream service stays unavailable, leaving dirty records.
Increased latency and load on the downstream service.
Mitigations include making the downstream API idempotent, using background reconciliation jobs, or accepting the latency trade‑off when consistency requirements are low.
Asynchronous Queue
After persisting business data, the service publishes a message to a message queue for an independent consumer to process.
m := db.Insert(sql)
err := mq.Publish("B-Service-topic", m)If the publish fails (e.g., network issue), the database write succeeds while the message is lost, reproducing the inconsistency problem of blocking retries.
TCC Compensation Transaction
TCC splits each remote call into three phases: Try (resource reservation), Confirm (final commit), and Cancel (rollback).
Try : Validate and reserve resources.
Confirm : Finalize the reservation.
Cancel : Release reserved resources if any preceding step fails.
Example with inventory, payment, and points services:
m := db.Insert(sql)
aResult, aErr := A.Try(m)
bResult, bErr := B.Try(m)
cResult, cErr := C.Try(m)
if cErr != nil {
A.Cancel()
B.Cancel()
C.Cancel()
} else {
A.Confirm()
B.Confirm()
C.Confirm()
}Empty Release
If a Try call actually succeeded but the network reports failure, a subsequent Cancel may attempt to release a resource that was never locked. Services should treat such cancellations as no‑ops.
Ordering Issues
Network delays can cause a Cancel request to arrive before the corresponding Try, leading to premature resource release. Using a unique transaction ID allows the service to distinguish stale cancellations from valid operations.
Call Failures
Both Cancel and Confirm can fail due to network problems. Common mitigation strategies:
Blocking retry (with the same drawbacks as before).
Log the failure and push a compensating message to a queue for asynchronous or manual handling.
Because the two‑phase code is not atomic, intermediate states may persist and require explicit handling.
Local Message Table
The local message table, originally proposed by eBay, stores a message record in the same database transaction as the business data. The message status (e.g., try or confirm) is observable by an OnMessage handler, which retries failed publishes.
With MQ
messageTx := tc.NewTransaction("order")
messageTxSql := tx.TryPlan("content")
m, err := db.InsertTx(sql, messageTxSql)
if err != nil {
return err
}
aErr := mq.Publish("B-Service-topic", m)
if aErr != nil { // publish failed
messageTx.Confirm() // set status to confirm for later retry
} else {
messageTx.Cancel() // delete the message
}
func OnMessage(task *Task) {
err := mq.Publish("B-Service-topic", task.Value())
if err == nil {
messageTx.Cancel()
}
}SQL for the message table (simplified):
INSERT INTO tcc_async_task (uid, name, value, status) VALUES (?, ?, ?, ?);The status field indicates try or confirm. Regardless of process crashes or network failures, the asynchronous listener ensures the message eventually reaches the MQ.
With Service Call (No MQ)
messageTx := tc.NewTransaction("order")
messageTxSql := tx.TryPlan("content")
body, err := db.InsertTx(sql, messageTxSql)
if err != nil {
return err
}
aErr := request.POST("B-Service", body)
if aErr != nil { // B-Service call failed
messageTx.Confirm() // mark for later retry
} else {
messageTx.Cancel() // delete the message
}
func OnMessage(task *Task) {
// retry logic for B-Service can be placed here
}If there is no DB write, a message record can still be inserted and processed by OnMessage for retry handling.
Message Expiration
func OnConfirmMessage(task *tcc.Task) {
if time.Now().Sub(task.CreatedAt) > time.Hour {
err := task.Cancel() // stop further retries
// alerting (e.g., email, SMS) for manual intervention
return
}
}Handlers should also skip very recent try messages to avoid duplicate retries.
Independent Message Service
An independent message service extracts the message table into a separate service. The workflow adds a message before any business operation; if the operation succeeds, the message is deleted, otherwise it remains for later processing. Because the message insertion cannot be wrapped in the same local transaction, a failed business operation after a successful message insert can leave an orphaned message.
err := request.POST("Message-Service", body)
if err != nil {
return err
}
aErr := request.POST("B-Service", body)
if aErr != nil {
return aErr
}This pattern introduces an additional prepare state that the consumer must handle.
MQ Transaction
Some MQ implementations (e.g., RocketMQ) support transactional messages. The pattern mirrors the independent message service: a message is first sent in a prepare state, then later Confirm or Cancel finalizes or discards it. Consumers must handle the prepare state to confirm business success.
Summary
Ensuring data consistency in microservice environments inevitably requires mechanisms beyond simple retries.
TCC provides a framework‑agnostic, service‑layer solution that does not depend on a specific database. It requires three APIs per service and careful handling of network‑induced failures such as empty releases, ordering issues, and retry of confirm/cancel steps.
Local Message Table offers a simple, transaction‑bound approach that works with both direct service calls and MQ. It couples messaging logic to the business database but guarantees that a message is persisted atomically with the business record.
Independent Message Service and MQ Transaction decouple messaging from business data, reducing coupling but introducing an extra prepare state and potential orphan messages. Transactional MQ support is limited, and the extra round‑trip adds latency.
Open‑source implementations, such as the library at https://github.com/mushroomsir/tcc, illustrate these patterns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
