How Go 1.25 Flight Recorder Lets You Debug Production Slowness After the Fact

Go 1.25 introduces Flight Recorder, a lightweight in‑memory trace buffer that captures recent execution data and can be snapshotted on demand, enabling developers to retroactively investigate latency spikes in long‑running services without the overhead of continuous tracing.

IT Services Circle
IT Services Circle
IT Services Circle
How Go 1.25 Flight Recorder Lets You Debug Production Slowness After the Fact

When a production service suddenly slows down, the problem often disappears before a trace can be collected, making post‑mortem debugging difficult.

Background

Go has always provided execution tracing via the runtime/trace package, which records every event while the program runs. This is useful for short‑lived programs, but keeping tracing on a long‑running web service generates massive amounts of data and is impractical.

Developers typically notice a timeout or a failed health check, then try to start trace.Start(), but by then the incident is already over.

What Flight Recorder Is

Flight Recorder solves this by continuously buffering the most recent seconds of trace data in memory instead of writing to a file or socket. When the program detects a problem, it can snapshot the buffer and obtain a precise view of the events that occurred just before the issue.

Practical Example

The following example implements a simple HTTP "guess‑number" game and a goroutine that sends a report every minute.

type bucket struct {
    mu      sync.Mutex
    guesses int
}

func main() {
    buckets := make([]bucket, 100)

    // Periodic report
    go func() {
        for range time.Tick(1 * time.Minute) {
            sendReport(buckets)
        }
    }()

    answer := rand.Intn(len(buckets))
    http.HandleFunc("/guess-number", func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        guess, err := strconv.Atoi(r.URL.Query().Get("guess"))
        if err != nil || guess < 0 || guess >= len(buckets) {
            http.Error(w, "invalid 'guess' value", http.StatusBadRequest)
            return
        }
        b := &buckets[guess]
        b.mu.Lock()
        b.guesses++
        b.mu.Unlock()
        fmt.Fprintf(w, "guess: %d, correct: %t", guess, guess == answer)
        log.Printf("HTTP request: endpoint=/guess-number guess=%d duration=%s", guess, time.Since(start))
    })

    log.Fatal(http.ListenAndServe(":8090", nil))
}

func sendReport(buckets []bucket) {
    counts := make([]int, len(buckets))
    for index := range buckets {
        b := &buckets[index]
        b.mu.Lock()
        defer b.mu.Unlock()
        counts[index] = b.guesses
    }
    b, err := json.Marshal(counts)
    if err != nil {
        log.Printf("failed to marshal report data: error=%s", err)
        return
    }
    url := "http://localhost:8091/guess-number-report"
    if _, err := http.Post(url, "application/json", bytes.NewReader(b)); err != nil {
        log.Printf("failed to send report: %s", err)
    }
}

After deployment, some requests take more than 100 ms, while most are microseconds‑level:

2025/09/19 16:52:02 HTTP request: endpoint=/guess-number guess=69 duration=625ns
2025/09/19 16:52:02 HTTP request: endpoint=/guess-number guess=42 duration=1.417µs
2025/09/19 16:52:02 HTTP request: endpoint=/guess-number guess=86 duration=115.186167ms
2025/09/19 16:52:02 HTTP request: endpoint=/guess-number guess=0 duration=127.993375ms

Using Flight Recorder to Diagnose

First, configure and start the recorder in main:

// Configure Flight Recorder
fr := trace.NewFlightRecorder(trace.FlightRecorderConfig{
    MinAge:  200 * time.Millisecond, // about twice the expected problem window
    MaxBytes: 1 << 20, // 1 MiB buffer to avoid memory explosion
})
fr.Start()

When a request exceeds 100 ms, capture a snapshot:

if fr.Enabled() && time.Since(start) > 100*time.Millisecond {
    go captureSnapshot(fr)
}

The snapshot is written to snapshot.trace and later inspected with the built‑in tool:

go tool trace snapshot.trace

The tool opens a local web UI. In the timeline view a large idle period of ~100 ms appears, and many goroutines are blocked on a single goroutine executing sendReport. Examining the stack shows that the lock in sendReport is never released because the defer b.mu.Unlock() runs only when the function returns, not after each loop iteration.

trace timeline
trace timeline

Fix the bug by unlocking inside the loop:

for index := range buckets {
    b := &buckets[index]
    b.mu.Lock()
    counts[index] = b.guesses
    b.mu.Unlock()
}

Conclusion

Flight Recorder acts like a black‑box for Go programs: it continuously records recent execution, and when a problem is detected you can instantly retrieve a detailed trace without the overhead of always‑on tracing. This makes it ideal for production performance debugging and complements earlier Go tracing improvements introduced in Go 1.21 and Go 1.22.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendGoTracingPerformance debuggingflight-recorder
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.