Diagnosing and Optimizing High CPU Usage of a Go Service Migrated to Kubernetes Using Flame Graphs
This article describes how a Go‑based message‑push service migrated to a Kubernetes‑Docker container platform exhibited unexpectedly high CPU usage, the step‑by‑step profiling with go‑tool pprof and flame graphs, analysis of the Go GMP scheduler, and the resolution by configuring GOMAXPROCS to match the container‑allocated cores.
Background In the context of the company's cloud‑native services, a Go message‑push service was moved from a physical machine to a Kubernetes + Docker platform. After migration the service’s CPU usage spiked from an expected 20 % to about 70 % during peak load.
Problem Phenomenon The container showed CPU consumption three times higher than anticipated, and pidstat revealed a high thread‑switch rate (thousands per second).
Investigation Process
1. Sampling – After confirming that request volume was comparable between the physical and container nodes, go‑tool pprof was used for profiling. The service was instrumented with import _ "net/http/pprof" and data were collected via go tool pprof http://ip:port/debug/pprof/profile?seconds=30 .
2. Flame Graph Generation – The pprof data were visualized with go‑torch, producing flame graphs that highlighted the hot paths.
3. Analysis – The flame graph showed that runtime code (especially runtime.gcBgMarkWorker , runtime.schedule , and runtime.findrunnable ) consumed ~60 % of CPU, while business logic used ~40 %.
4. Root Cause Exploration – Go’s scheduler uses the G‑M‑P model. By default it sets the number of virtual processors (P) to the host’s CPU count, not the container‑limited count. In a host with 32 cores and a container limited to 8 cores, 32 P were created, leading to excessive GC workers and scheduler overhead.
5. Verification – A small Go program demonstrated that runtime.NumCPU() and runtime.GOMAXPROCS(0) both report the host’s core count:
func main() {
cpu := runtime.NumCPU()
procs := runtime.GOMAXPROCS(0)
fmt.Println("cpu num:", cpu, " GOMAXPROCS:", procs)
}
// output -> cpu num:32 GOMAXPROCS:326. Solution – Setting the environment variable GOMAXPROCS to the number of cores allocated to the container (e.g., 8) reduced the CPU peak from 69 % to 19 % in gray‑scale testing, and the improvement persisted after full rollout.
Principle Analysis – The article also explains the Go GMP scheduler, work‑stealing, and how excessive P values cause unnecessary runtime work, especially GC background workers.
Extended Discussion – Similar container‑CPU‑recognition issues exist in Java (fixed in JDK 8u191) and can be mitigated with tools like Uber’s automaxprocs library.
Conclusion – By aligning GOMAXPROCS with the container’s CPU quota, the abnormal CPU usage was eliminated, and the article provides a deeper understanding of Go’s scheduler and container CPU accounting.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.