Master Go Performance Profiling with pprof: From Basics to Real‑World Fixes

This comprehensive guide explains how to use Go's built‑in pprof tool to profile CPU, memory, and goroutine behavior, covering integration methods, command‑line analysis, visualization techniques, and step‑by‑step solutions for common performance bottlenecks in production Go services.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Master Go Performance Profiling with pprof: From Basics to Real‑World Fixes

In Go development, performance problems such as high CPU usage, memory leaks, or runaway goroutines can cripple services. The official pprof tool acts like a precise scalpel, helping developers locate hidden bottlenecks.

1. Understanding pprof: the foundation of performance analysis

pprof

originates from Google and works by sampling a running program, generating profile files that can be interpreted to find the root cause of performance issues.

Performance metrics pprof can analyze

pprof

supports multiple dimensions, each suited to different scenarios:

Type

Core meaning

Typical scenario

cpu

Samples CPU time at 100 Hz

Diagnose high CPU usage or compute‑intensive bottlenecks

heap

Samples heap allocation

Locate memory leaks or excessive allocations

goroutine

Records stack traces of all goroutines

Detect goroutine leaks or blocking

block

Tracks blocking operations (locks, channel waits)

Analyze synchronization‑induced slowdown

mutex

Records mutex contention

Find hot lock contention

threadcreate

Samples thread creation data

Resolve excessive thread creation

Knowing these types lets you quickly pick the right analysis mode.

2. Integrating pprof: two options

Option 1: Expose via HTTP (recommended for long‑running services)

Import the side‑effect package net/http/pprof, start an HTTP server (commonly on a separate port), and the tool automatically registers routes under /debug/pprof. Example:

import (
    "net/http"
    _ "net/http/pprof" // registers handlers automatically
)

func main() {
    go func() { _ = http.ListenAndServe("localhost:6060", nil) }()
    // your business logic here
    select {}
}

Verify by visiting http://localhost:6060/debug/pprof.

Option 2: Manually generate profile files (for short‑lived programs)

Import runtime/pprof, create files, and write profiles at appropriate moments:

package main
import (
    "os"
    "runtime/pprof"
)

func main() {
    cpuFile, _ := os.Create("cpu.pprof")
    defer cpuFile.Close()
    pprof.StartCPUProfile(cpuFile)
    defer pprof.StopCPUProfile()

    heapFile, _ := os.Create("heap.pprof")
    defer heapFile.Close()
    defer pprof.WriteHeapProfile(heapFile)

    heavyTask()
}

func heavyTask() { /* simulate work */ }

3. Analysis tool: go tool pprof details

After collecting a profile, use go tool pprof to analyze it. The command format is:

go tool pprof [options] <profile source>

The source can be a local file (e.g., cpu.pprof) or an HTTP endpoint (e.g., http://localhost:6060/debug/pprof/cpu?seconds=30).

Common options: -inuse_space: show currently used heap memory -alloc_space: show cumulative allocated memory -seconds N: set collection duration (HTTP only)

Typical interactive commands:

top N : list the top N functions by cost

list <func> : show source lines with per‑line cost

web : generate a Graphviz call‑graph (requires graphviz)

peek <func> : view callers and callees of a function

traces : display full stack traces

quit/exit : leave interactive mode

4. Hands‑on: solving three typical performance problems

Scenario 1 – High CPU usage

Problem: A program runs with sustained high CPU and slow response.

package main
import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

func main() {
    go func() { http.ListenAndServe("localhost:6060", nil) }()
    for {
        slowFunction()
        time.Sleep(100 * time.Millisecond)
    }
}

func slowFunction() {
    sum := 0
    for i := 0; i < 1e7; i++ { // heavy loop
        sum += i
    }
}

Collect CPU data:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

In the interactive session run top 5 and then list slowFunction to pinpoint the loop. Optimisation: reduce the loop count (e.g., from 1e7 to 1e6).

Scenario 2 – Memory leak

Problem: Memory usage continuously grows.

package main
import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

var globalSlice []int // accumulates data

func main() {
    go func() { http.ListenAndServe("localhost:6060", nil) }()
    for {
        leakMemory()
        time.Sleep(100 * time.Millisecond)
    }
}

func leakMemory() {
    data := make([]int, 1024*2) // ~8KB
    globalSlice = append(globalSlice, data...)
}

Collect heap profile:

go tool pprof -inuse_space http://localhost:6060/debug/pprof/heap

Use top 5 and list leakMemory to see the allocation site. Optimisation: avoid unbounded growth of global structures or use bounded containers.

Scenario 3 – Goroutine leak

Problem: Goroutine count keeps rising until resources are exhausted.

package main
import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

func main() {
    go func() { http.ListenAndServe("localhost:6060", nil) }()
    for {
        leakGoroutine()
        time.Sleep(100 * time.Millisecond)
    }
}

func leakGoroutine() {
    ch := make(chan int) // unbuffered
    go func() { <-ch }() // blocks forever
}

Collect goroutine profile:

go tool pprof http://localhost:6060/debug/pprof/goroutine?debug=1

Run traces to see many goroutines stuck on <-ch. Optimisation: provide cancellation via context.Context or ensure channels are closed.

5. Visualization tools: making analysis intuitive

Install Graphviz to enable the web command:

Ubuntu/Debian: sudo apt-get install graphviz macOS: brew install graphviz Windows: download installer and add bin to PATH

After installation, run web inside go tool pprof to open an interactive call‑graph.

Flame graphs provide another visual view. Install go‑torch: go install github.com/uber/go-torch@latest Generate a CPU flame graph:

go‑torch -u http://localhost:6060 -t 30

6. Summary and outlook

pprof

is a powerful Go performance‑analysis tool that helps you pinpoint bottlenecks across CPU, memory, and concurrency. Mastering its integration, data collection, and interactive commands enables systematic diagnosis and optimisation, turning performance challenges into manageable tasks.

Typical workflow:

Select integration method based on program type (HTTP vs. file).

Collect the appropriate profile (CPU, heap, goroutine, etc.).

Use go tool pprof with commands like top, list, web to analyse.

Apply code changes and verify improvements.

Adopt pprof early in development and testing to catch performance issues before they reach production.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentGoperformance profilingpprofCPU optimizationgoroutine-leak
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.