How Hedged Requests Cut Tail Latency in Go Microservices

This article explains the hedged request pattern used by Google to combat microservice tail latency, shows how to implement it in Go with context and goroutines, discusses its impact on latency and load, and explores additional techniques such as SingleFlight and service‑class prioritization to further reduce tail delays.

Radish, Keep Going!
Radish, Keep Going!
Radish, Keep Going!
How Hedged Requests Cut Tail Latency in Go Microservices

Hedged request pattern, described in the paper "The Tail at Scale", is Google's solution to microservice tail latency and one of two retry modes in gRPC.

In a hedged request, the client sends the same request to multiple nodes and cancels the remaining in‑flight requests as soon as the first response arrives, providing predictable latency.

Assuming a call chain of 20 nodes with each node’s P99 latency of 1 s, about 18.2 % of requests exceed 1 s. By always taking the fastest node, hedging eliminates unpredictable tail latency (excluding service failures).

In Go, the pattern can be implemented with context and goroutines. The example below launches five concurrent requests to the same backend and returns the first successful response, cancelling the others.

func hedgedRequest() string {
    ch := make(chan string) // chan used to abort other requests
    ctx, cancel := context.WithCancel(context.Background())

    for i := 0; i < 5; i++ {
        go func(ctx *context.Context, ch chan string, i int) {
            log.Println("in goroutine: ", i)
            if request(ctx, "http://localhost:8090", i) {
                ch <- fmt.Sprintf("finish [from %v]", i)
                log.Println("completed goroutine: ", i)
            }
        }(&ctx, ch, i)
    }

    select {
    case s := <-ch:
        cancel()
        log.Println("cancelled all inflight requests")
        return s
    case <-time.After(5 * time.Second):
        cancel()
        return "all requests timeout after 5 secs"
    }
}

The full code is available at https://go.dev/play/p/fY9Lj_M7ZYE. While hedging reduces tail latency, it multiplies load and must be designed carefully.

Why Does Tail Latency Occur?

Many factors contribute to tail latency, such as:

Mixed deployments causing resource contention on a single physical machine.

Garbage collection pauses (e.g., Go’s STW) that amplify tail latency.

Queueing delays in message queues, networks, etc.

How can we mitigate the request amplification caused by hedging?

One approach is to use SingleFlight to merge identical requests, as described in “Go High‑Performance Programming EP7”. Another technique sends a single request and, if the P95 deadline passes without a response, immediately issues a second request, reducing duplicate traffic to about 5% and shortening tail latency.

Additional methods from the literature include:

Differentiating service classes and higher‑level queuing : prioritize interactive requests and keep low‑priority queues short.

Reducing head‑of‑line blocking : break large requests into smaller ones.

Micro‑partitioning : fine‑grained load distribution to lower imbalance‑induced latency.

Applying circuit breakers on poorly performing machines.

Do you have other effective ways to handle tail requests?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

microservicesGogRPCBackend Performancetail latencyhedged requests
Radish, Keep Going!
Written by

Radish, Keep Going!

Personal sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.