How Request Hedging Cuts Go HTTP Client P99 Latency by 74%

The article explains why tail latency hurts microservices, why simple retries fail, and how implementing Google’s request‑hedging technique in Go’s http.Client can slash P99 latency by about 64% while incurring only a modest P50 increase.

TonyBai
TonyBai
TonyBai
How Request Hedging Cuts Go HTTP Client P99 Latency by 74%

Tail Latency

In large‑scale microservice systems most requests meet P50 or P90 targets, but a small fraction (P99 or P99.9) experience severe delays. Dean and Barroso’s 2013 paper The Tail at Scale shows that hardware quirks, OS scheduling, and Go’s GC can cause occasional long pauses. When a service fans out to many downstream calls, the probability of a slow tail multiplies dramatically (e.g., 100 parallel calls with a 1 % per‑call >1 s probability yields a 63 % chance the overall request exceeds 1 s).

n = 100, single‑node >1 s probability = 1 %; overall success probability = 0.99^100 ≈ 36.6 % → failure probability ≈ 63 %.

Retry vs Hedging

Retry : waits for a failure or timeout before issuing a second request; adds at least the timeout duration and can amplify load during spikes.

Hedging : treats slowness as the problem; after a configurable hedging delay it launches a duplicate request without waiting for the first to fail, and uses whichever response arrives first.

Hedging Mechanics

The core idea is simple: send the original request, start a timer (the hedging delay), and if the timer fires before a response arrives, fire a backup request to another replica. The first successful response wins; the remaining in‑flight requests are cancelled via a shared context.WithCancel.

Go Implementation

A custom http.RoundTripper named HedgedTransport injects the hedging logic into the standard http.Client. Key fields: Transport: underlying transport (defaults to http.DefaultTransport). MaxAttempts: total concurrent attempts, including the initial request. HedgeDelay: delay before spawning a backup request.

The RoundTrip method creates a cancellable context, launches the first request in a goroutine, and uses a result channel to capture the first successful *http.Response. A timer controls when additional attempts are started, and the method returns as soon as a non‑error response is received, cancelling the rest.

type HedgedTransport struct {
    Transport   http.RoundTripper
    MaxAttempts int
    HedgeDelay  time.Duration
}

func (ht *HedgedTransport) RoundTrip(req *http.Request) (*http.Response, error) {
    // implementation follows the description above
    // (omitted for brevity)
    return nil, nil
}

Demo Project Layout

hedge.go

: core hedging logic. server.go: mock HTTP server that returns fast responses 90 % of the time and a long tail (500 ms–1 s) 10 % of the time. main.go: benchmark driver that runs 1 000 concurrent requests against both a normal client and a hedged client.

Mock Server (server.go)

http.HandleFunc("/data", func(w http.ResponseWriter, r *http.Request) {
    if rand.Float32() < 0.1 {
        delay := 500 + rand.Intn(500) // 500‑1000 ms
        time.Sleep(time.Duration(delay) * time.Millisecond)
    } else {
        delay := 10 + rand.Intn(40) // 10‑50 ms
        time.Sleep(time.Duration(delay) * time.Millisecond)
    }
    fmt.Fprintln(w, "OK")
})

Benchmark Driver (main.go)

const RequestCount = 1000

func main() {
    startServer()
    normalClient := &http.Client{Timeout: 2 * time.Second}
    normalLatencies := runBenchmark(normalClient)

    hedgedClient := &http.Client{Timeout: 2 * time.Second, Transport: &HedgedTransport{Transport: http.DefaultTransport, MaxAttempts: 3, HedgeDelay: 80 * time.Millisecond}}
    hedgedLatencies := runBenchmark(hedgedClient)

    printStats("Normal Client", normalLatencies)
    printStats("Hedged Client", hedgedLatencies)
}

Results

=== Normal Client Statistics ===
Request count: 1000
P50 latency: 115.2ms
P95 latency: 850.8ms
P99 latency: 1.045s

=== Hedged Client Statistics ===
Request count: 1000
P50 latency: 138.9ms  // ~23ms increase
P95 latency: 360.6ms  // huge improvement
P99 latency: 376.9ms  // ~64% reduction

Key observations:

P99 improvement : hedging reduced the 99th‑percentile latency from over 1 s to under 400 ms (≈64 % drop).

P50 cost : the median latency rose by about 23 ms due to extra request cloning and context handling.

Production‑Ready Guidelines

Idempotency : Hedging sends duplicate requests; it is safe only for idempotent operations (e.g., GET) or writes guarded by a global transaction ID.

Choosing Hedge Delay : Set the delay to the service’s historical P95 latency (measured via Prometheus or similar). Too short overloads backends; too long defeats the purpose.

Throttling & Circuit Breaking : Combine hedging with a token‑bucket or circuit‑breaker so that when downstream services are down, hedged requests are suppressed rather than amplifying the failure.

Conclusion

Request hedging offers a high‑impact trade‑off: a modest increase in average latency buys a dramatic reduction in tail latency, making services more predictable under load. Embedding the technique directly into Go’s http.Client reproduces Google‑scale latency stability without hardware changes.

References

https://www.reddit.com/r/golang/comments/1s4mb10/reduced_p99_latency_by_74_in_go_learned_something/

https://grpc.io/docs/guides/request-hedging/

https://research.google/pubs/the-tail-at-scale/

https://github.com/bigwhite/experiments/tree/master/go-hedging-demo

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformancemicroservicesGoTail LatencyHTTP ClientRequest Hedging
TonyBai
Written by

TonyBai

Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.