How to Turn Synchronous RPC into Asynchronous Queues for Reliable Microservices

The article examines the reliability challenges of microservice architectures that rely heavily on synchronous RPC calls, and proposes a comprehensive solution that converts failing RPCs to asynchronous message‑queue workflows, introduces a write‑ahead‑queue for transactional consistency between databases and queues, and outlines offset management to ensure end‑to‑end fault tolerance.

Didi Tech
Didi Tech
Didi Tech
How to Turn Synchronous RPC into Asynchronous Queues for Reliable Microservices

Problem Statement

Microservice architectures solve many issues but also introduce new reliability problems, especially when a large number of services depend on synchronous RPC calls. When an upstream service fails, the whole request may either fail or incorrectly succeed, leading to instability and data inconsistency.

Key questions include:

How to guarantee reliability when many synchronous RPC dependencies exist?

How to repair data after a degraded RPC call?

How to ensure a message queue used as an RPC side‑path remains reliable?

How to keep the queue and database transactions consistent?

Synchronous‑to‑Asynchronous Conversion

During normal operation, calls are synchronous. If a call fails, it is automatically degraded to an asynchronous workflow: the request is placed into a message queue and retried later. Downstream handling can be classified into three scenarios:

Strong dependency : the downstream service must be available; the request cannot proceed if it is down.

Degradable dependency : the response depends on downstream processing, but the call can be downgraded; the degraded path returns no data from that part.

Fully asynchronous : the downstream service only consumes messages from the queue, with no direct RPC interaction.

Integrating the Queue into the Main Flow

When critical business logic is placed after the queue, the write to the queue must be treated as part of the main transaction. If the queue write fails or times out, the entire request should return an error instead of continuing.

Because Kafka’s latency and stability may not meet online service requirements, a local file‑based queue can be used as a buffer. The local queue forwards messages to a remote Kafka cluster via an agent, providing higher availability and lower latency.

Ensuring Database‑Queue Transaction Consistency

The goal is: if the database transaction succeeds, the message must be persisted in the queue; if the transaction fails, no message should appear in the queue. The proposed solution uses a dedicated write‑ahead‑queue (WAQ) topic.

All requests first write a record to the WAQ topic. If this write fails, an error is returned immediately.

The database transaction is executed.

If the transaction fails, the WAQ offset is moved to mark the request as processed, and no further action is taken.

If the transaction succeeds, the request is written to a business‑event‑queue topic.

When the business‑event‑queue write succeeds, the WAQ offset is moved, marking the request as fully completed.

Failure handling:

If writing to the business‑event‑queue fails, the system still returns success to the caller and retries the queue write asynchronously.

The WAQ offset remains unchanged until the business‑event‑queue write eventually succeeds, ensuring the whole process can be resumed from the WAQ.

This mechanism guarantees that a successful database commit always leads to a persisted message, without requiring a traditional two‑phase commit.

Write‑Ahead‑Queue Offset Management

Each RPC thread writes a WAL entry with a unique offset. Offsets are only advanced when all preceding entries have been successfully processed, similar to TCP window sliding. Example flow:

Thread 1 writes WAL1 (offset 1).

Thread 2 writes WAL2 (offset 2).

Thread 3 writes WAL3 (offset 3).

Thread 3 finishes first but cannot move the offset because offsets 1 and 2 are pending.

Thread 1 finishes next; with no earlier pending entries, the offset moves to 1.

Thread 2 then finishes; since offset 3 is already completed, the offset jumps to 3.

This approach allows multiple concurrent RPC threads to share a single Kafka partition while preserving correct ordering and completion semantics.

Other Approaches

Two alternative methods for achieving DB‑queue consistency are mentioned:

Using a database as a queue with multi‑table transactions (e.g., Qunar’s design).

Employing a two‑phase‑commit capable message broker (e.g., Taobao’s Notify system).

Both rely on MySQL as the message store, increasing operational complexity.

Summary

Three independent techniques are combined to improve reliability and consistency in RPC‑heavy microservice clusters:

Convert synchronous RPC calls to asynchronous queue‑based retries when failures occur.

Introduce a local file queue as a buffer for Kafka to boost availability and reduce latency.

Use a two‑level queue architecture with a write‑ahead‑queue to guarantee that messages are only written after successful database transactions.

By integrating these solutions, developers can safely place critical business logic behind a reliable message queue, achieving higher availability, data integrity, and fault‑tolerant execution across the entire microservice ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesRPCKafkaMessage QueueReliabilitytransaction consistencywrite-ahead queue
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.