Master Go Performance Profiling with pprof: From Basics to Real‑World Fixes
This comprehensive guide explains how to use Go's built‑in pprof tool to profile CPU, memory, and goroutine behavior, covering integration methods, command‑line analysis, visualization techniques, and step‑by‑step solutions for common performance bottlenecks in production Go services.
In Go development, performance problems such as high CPU usage, memory leaks, or runaway goroutines can cripple services. The official pprof tool acts like a precise scalpel, helping developers locate hidden bottlenecks.
1. Understanding pprof: the foundation of performance analysis
pproforiginates from Google and works by sampling a running program, generating profile files that can be interpreted to find the root cause of performance issues.
Performance metrics pprof can analyze
pprofsupports multiple dimensions, each suited to different scenarios:
Type
Core meaning
Typical scenario
cpu
Samples CPU time at 100 Hz
Diagnose high CPU usage or compute‑intensive bottlenecks
heap
Samples heap allocation
Locate memory leaks or excessive allocations
goroutine
Records stack traces of all goroutines
Detect goroutine leaks or blocking
block
Tracks blocking operations (locks, channel waits)
Analyze synchronization‑induced slowdown
mutex
Records mutex contention
Find hot lock contention
threadcreate
Samples thread creation data
Resolve excessive thread creation
Knowing these types lets you quickly pick the right analysis mode.
2. Integrating pprof: two options
Option 1: Expose via HTTP (recommended for long‑running services)
Import the side‑effect package net/http/pprof, start an HTTP server (commonly on a separate port), and the tool automatically registers routes under /debug/pprof. Example:
import (
"net/http"
_ "net/http/pprof" // registers handlers automatically
)
func main() {
go func() { _ = http.ListenAndServe("localhost:6060", nil) }()
// your business logic here
select {}
}Verify by visiting http://localhost:6060/debug/pprof.
Option 2: Manually generate profile files (for short‑lived programs)
Import runtime/pprof, create files, and write profiles at appropriate moments:
package main
import (
"os"
"runtime/pprof"
)
func main() {
cpuFile, _ := os.Create("cpu.pprof")
defer cpuFile.Close()
pprof.StartCPUProfile(cpuFile)
defer pprof.StopCPUProfile()
heapFile, _ := os.Create("heap.pprof")
defer heapFile.Close()
defer pprof.WriteHeapProfile(heapFile)
heavyTask()
}
func heavyTask() { /* simulate work */ }3. Analysis tool: go tool pprof details
After collecting a profile, use go tool pprof to analyze it. The command format is:
go tool pprof [options] <profile source>The source can be a local file (e.g., cpu.pprof) or an HTTP endpoint (e.g., http://localhost:6060/debug/pprof/cpu?seconds=30).
Common options: -inuse_space: show currently used heap memory -alloc_space: show cumulative allocated memory -seconds N: set collection duration (HTTP only)
Typical interactive commands:
top N : list the top N functions by cost
list <func> : show source lines with per‑line cost
web : generate a Graphviz call‑graph (requires graphviz)
peek <func> : view callers and callees of a function
traces : display full stack traces
quit/exit : leave interactive mode
4. Hands‑on: solving three typical performance problems
Scenario 1 – High CPU usage
Problem: A program runs with sustained high CPU and slow response.
package main
import (
"net/http"
_ "net/http/pprof"
"time"
)
func main() {
go func() { http.ListenAndServe("localhost:6060", nil) }()
for {
slowFunction()
time.Sleep(100 * time.Millisecond)
}
}
func slowFunction() {
sum := 0
for i := 0; i < 1e7; i++ { // heavy loop
sum += i
}
}Collect CPU data:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30In the interactive session run top 5 and then list slowFunction to pinpoint the loop. Optimisation: reduce the loop count (e.g., from 1e7 to 1e6).
Scenario 2 – Memory leak
Problem: Memory usage continuously grows.
package main
import (
"net/http"
_ "net/http/pprof"
"time"
)
var globalSlice []int // accumulates data
func main() {
go func() { http.ListenAndServe("localhost:6060", nil) }()
for {
leakMemory()
time.Sleep(100 * time.Millisecond)
}
}
func leakMemory() {
data := make([]int, 1024*2) // ~8KB
globalSlice = append(globalSlice, data...)
}Collect heap profile:
go tool pprof -inuse_space http://localhost:6060/debug/pprof/heapUse top 5 and list leakMemory to see the allocation site. Optimisation: avoid unbounded growth of global structures or use bounded containers.
Scenario 3 – Goroutine leak
Problem: Goroutine count keeps rising until resources are exhausted.
package main
import (
"net/http"
_ "net/http/pprof"
"time"
)
func main() {
go func() { http.ListenAndServe("localhost:6060", nil) }()
for {
leakGoroutine()
time.Sleep(100 * time.Millisecond)
}
}
func leakGoroutine() {
ch := make(chan int) // unbuffered
go func() { <-ch }() // blocks forever
}Collect goroutine profile:
go tool pprof http://localhost:6060/debug/pprof/goroutine?debug=1Run traces to see many goroutines stuck on <-ch. Optimisation: provide cancellation via context.Context or ensure channels are closed.
5. Visualization tools: making analysis intuitive
Install Graphviz to enable the web command:
Ubuntu/Debian: sudo apt-get install graphviz macOS: brew install graphviz Windows: download installer and add bin to PATH
After installation, run web inside go tool pprof to open an interactive call‑graph.
Flame graphs provide another visual view. Install go‑torch: go install github.com/uber/go-torch@latest Generate a CPU flame graph:
go‑torch -u http://localhost:6060 -t 306. Summary and outlook
pprofis a powerful Go performance‑analysis tool that helps you pinpoint bottlenecks across CPU, memory, and concurrency. Mastering its integration, data collection, and interactive commands enables systematic diagnosis and optimisation, turning performance challenges into manageable tasks.
Typical workflow:
Select integration method based on program type (HTTP vs. file).
Collect the appropriate profile (CPU, heap, goroutine, etc.).
Use go tool pprof with commands like top, list, web to analyse.
Apply code changes and verify improvements.
Adopt pprof early in development and testing to catch performance issues before they reach production.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
