Master Go Performance Debugging with pprof and trace: Heap, Goroutine & CPU Profiling

This guide explains why and how to use Go's pprof and trace tools to diagnose memory leaks, goroutine stalls, and CPU bottlenecks, providing step‑by‑step commands, example code, a cheat‑sheet of core commands, and a practical workflow for microservice tracing.

Code Wrench
Code Wrench
Code Wrench
Master Go Performance Debugging with pprof and trace: Heap, Goroutine & CPU Profiling

Why use pprof and trace

pprof collects CPU, memory, and goroutine profiling data, allowing you to locate hot functions and resource leaks. trace records runtime events such as function calls, system calls, scheduling, and blocking, which is especially useful for analysing request flows in micro‑service architectures. Using both tools together enables rapid detection of performance bottlenecks in development and production.

Typical pprof analysis scenarios

1. Heap memory profiling

Scenario : The program allocates objects continuously, causing memory usage to grow and suggesting a leak.

package main

import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

type Data struct { buf [1024*1024]byte }
var data []*Data

func leak() {
    for {
        data = append(data, &Data{})
        time.Sleep(100 * time.Millisecond)
    }
}

func main() {
    go leak()
    http.ListenAndServe(":6060", nil)
}

Analysis steps (run while the program is listening on port 6060):

go tool pprof http://localhost:6060/debug/pprof/heap
(pprof) top        # list functions that allocate the most memory
(pprof) list leak  # show source lines inside the leak function
(pprof) web        # generate a graphviz visualisation of allocation paths

2. Goroutine stack profiling

Scenario : A large number of goroutines accumulate, indicating possible leaks or blocking.

package main

import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

func blockForever() { select {} }

func main() {
    for i := 0; i < 100; i++ { go blockForever() }
    go func() { http.ListenAndServe(":6060", nil) }()
    time.Sleep(time.Hour)
}

Analysis steps :

go tool pprof http://localhost:6060/debug/pprof/goroutine
(pprof) top            # show goroutine stacks with the most frames
(pprof) list blockForever # locate the blocking function in source
(pprof) web            # visualise the blocking paths as a graph

3. CPU profiling

Scenario : The application responds slowly and appears to consume excessive CPU.

package main

import (
    "math"
    "net/http"
    _ "net/http/pprof"
    "time"
)

func heavyCompute(n int) float64 {
    sum := 0.0
    for i := 0; i < n; i++ { sum += math.Sqrt(float64(i)) }
    return sum
}

func main() {
    go func() { for { heavyCompute(10000000) } }()
    go func() { http.ListenAndServe(":6060", nil) }()
    time.Sleep(time.Hour)
}

Analysis steps (collect a 30‑second CPU profile):

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
(pprof) top            # functions that consume the most CPU time
(pprof) list heavyCompute # drill into the hot function
(pprof) web            # produce a flame‑graph style visualisation

pprof cheat sheet

Heap analysis : go tool pprof http://.../heaptop, list, web to identify memory hotspots.

Goroutine analysis : go tool pprof http://.../goroutinetop, list, web to find blocked or leaked goroutines.

CPU analysis : go tool pprof http://.../profile?seconds=30top, list, web for CPU‑time hot paths.

Trace analysis for micro‑services

Scenario : Distributed requests are slow and the failure point is not obvious.

package main

import (
    "net/http"
    _ "net/http/pprof"
    "runtime/trace"
    "os"
    "time"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()
    go func() { http.ListenAndServe(":6060", nil) }()
    // simulate a few requests
    for i := 0; i < 10; i++ { time.Sleep(200 * time.Millisecond) }
}

Analysis command :

go tool trace trace.out

The trace viewer displays function calls, network I/O, scheduler delays and can be correlated with logs to pinpoint cross‑service bottlenecks.

Performance investigation workflow

Identify symptoms : high CPU, high memory, slow response, goroutine pile‑up.

Collect data : enable the /debug/pprof HTTP endpoints or generate a trace file.

Analyze hotspots : use top, list, web, or flame‑graph visualisations.

Locate the problem : drill down to the offending function, goroutine, or distributed trace segment.

Validate optimisations : modify code or configuration, re‑collect data, and compare the new profile with the baseline.

Practical tips

Run pprof locally during development; in production expose /debug/pprof or collect dump files for offline analysis.

Use trace for end‑to‑end request flow analysis in micro‑service environments and combine it with structured logging.

Flame‑graph and web visualisations dramatically speed up root‑cause identification.

Keep the cheat‑sheet and workflow diagram handy for quick reference.

Performance investigation diagram
Performance investigation diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendMicroservicesGoperformance profilingpproftrace
Code Wrench
Written by

Code Wrench

Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.