Which Retry Strategy Guarantees Reliable Message Delivery in Microservices?

The article examines five different retry mechanisms for ensuring reliable message delivery between payment and billing microservices, evaluates their pros and cons, and ultimately recommends the third solution as a cost‑effective, highly reliable approach achieving 99.99% consistency.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
Which Retry Strategy Guarantees Reliable Message Delivery in Microservices?

Background

In a microservice scenario, payment and billing services need to synchronize information so that the payment management console can search payment flows based on billing status, requiring eventual consistency.

Consumer

The consumer must guarantee that messages in the queue are eventually processed, using manual commit and a trigger‑query pattern where only an ID is passed and the full data is fetched via RPC.

Producer

The producer focuses on the reliability of sending messages to the queue.

Solution 1: Simple Retry

Retry up to five times with increasing intervals, log failures, raise alerts, and involve manual intervention. Pros: Simple, handles brief network glitches. Cons: Low reliability; cannot recover from a 30‑minute queue outage.

Solution 2: Exponential Backoff

Maintain an in‑memory retry queue with intervals of 5, 10, 20, 40, 80, 160, 320 seconds, then retry every 5 minutes, while logging and alerting. Pros: Much higher reliability, can recover from a 30‑minute outage, can be packaged for easy integration. Cons: Messages are lost if the service restarts or the machine crashes.

Solution 3: Exponential Backoff Plus Persistence

Same in‑memory retry logic as Solution 2, but also append retry entries to a local file, delete them after successful retry, and reload from disk on service restart. Pros: Very high reliability, generic, suitable for many scenarios. Cons: Introduces file I/O complexity.

Solution 4: Dedicated Retry Service

Implement a task‑retry microservice that maintains a task table and retries until success, acting as a reliable middleware when the message queue fails. Pros: High reliability, generic. Cons: Higher cost, adds an extra service that itself may fail.

Solution 5: Preemptive Task Registration

Before a change, write a task to a synchronization service with the intended message, execution window, and target. The task starts inactive; after business logic completes, an RPC activates it. If activation does not occur within the maximum time, the service auto‑activates. Pros: Fully reliable (eventual consistency). Cons: High development cost, increased message traffic, adds another node that may raise transaction failure risk.

Conclusion

Recommendation: Solution 3 – it offers strong reliability (≈99.99% consistency) with relatively low cost and without the complexity of additional services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesMessage QueueReliabilityretry strategyeventual consistency
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.