Business Compensation Mechanisms: Rollback, Retry, and Consistency in Distributed Systems
The article explains how distributed applications handle failures through business compensation mechanisms—defining rollback and retry strategies, consistency models, and practical considerations to maintain eventual consistency across multiple services.
In distributed applications, a single business process often involves multiple services, and any failure in the communication chain (DNS, network devices, load balancers) can cause inconsistencies.
To maintain consistency, business compensation mechanisms are defined to eliminate the inconsistent state when an operation fails.
1. What is Business Compensation
Business compensation addresses the need to either retry all steps until success or roll back to a previous state when a step fails.
2. Implementation Approaches
Rollback (transaction compensation) : reverse operations, abandoning the current failed step.
Retry : forward operation attempting to complete the process.
Compensation typically requires a workflow engine that orchestrates services and ensures eventual consistency.
Note: Compensation is an extra process; timeliness is less critical than correctness.
3. Rollback
Rollback restores the system to the state before the failed service call, either explicitly via reverse APIs or implicitly when downstream services handle failures.
Explicit rollback: call reverse interfaces or cancel unfinished operations, requiring resource locks.
Implicit rollback: downstream services automatically handle failures.
Key steps include identifying the failed step and providing sufficient data for the rollback operation.
4. Retry
Retry assumes the failure is temporary and attempts the operation again, avoiding the need for reverse interfaces.
Retry Scenarios
Applicable when downstream services return time‑outs, rate‑limit, or other transient errors; not suitable for permanent business errors.
Retry Strategies
Immediate retry
Fixed interval
Incremental interval
Exponential interval
Full jitter
Equal jitter
Example calculations:
return (retryCount - 1) * incrementInterval; return 2 ^ retryCount; return random(0 , 2 ^ retryCount); int baseNum = 2 ^ retryCount;
return baseNum + random(0 , baseNum);Retry Considerations
Idempotency is essential; each request should have a unique identifier and be checked before re‑execution.
Retry works well with rate limiting and circuit breaking mechanisms.
5. Precautions for Business Compensation
Prefer BASE consistency over ACID for scalability.
Ensure all services involved support idempotency and have retry mechanisms.
Centralize state monitoring in a highly available workflow engine.
Design forward and compensating processes together.
Provide short‑term resource reservation (e.g., inventory hold) to enable rollback.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.