Mastering Kubernetes Component Troubleshooting with pprof and Log Analysis
Learn a systematic approach to diagnosing Kubernetes core component issues by identifying faulty nodes, analyzing logs via systemd or static pods, and leveraging Go's pprof tool for performance profiling, including step‑by‑step commands and UI visualizations for components like kube‑apiserver, scheduler, controller‑manager, and kubelet.
Kubernetes's core components are like a house foundation; their importance is obvious. As a cluster maintainer, you often encounter component issues. This article outlines a concise troubleshooting workflow.
Identify faulty nodes or components via cluster status.
Analyze component logs.
Use pprof for performance analysis.
Define Scope
The set of core components is small and simple to deploy. For example, when running
kubectl get nodes, a node showing
NotReadysuggests either a kubelet problem or a network issue, guiding the initial direction for elimination.
We adopt a “hypothesize then verify” method, listing possible factors and checking them one by one until the issue is resolved.
Log Analysis
Log inspection is the most direct way to troubleshoot. Component logs can be viewed in two ways:
For services started by systemd:
journalctl -l -u <service>For static pod services:
kubectl logs -n kube-system $PODNAME --tail 100Additionally, monitor surrounding infrastructure metrics such as CPU, memory, and I/O.
Performance Analysis
Performance profiling is placed last because it requires time and knowledge of metrics. Kubernetes releases frequently, and bugs or performance regressions may appear. Go’s pprof tool and go‑torch can generate flame graphs for deeper insight.
All components expose pprof endpoints, e.g.,
host:port/debug/pprof/.
Common pprof Commands
Interactive
View stack traces:
<code>go tool pprof http://localhost:8001/debug/pprof/heap</code>Collect 30‑second CPU profile:
<code>go tool pprof http://localhost:8001/debug/pprof/profile?seconds=30</code>Show goroutine blocking:
<code>go tool pprof http://localhost:8001/debug/pprof/block</code>Collect 5‑second execution trace:
<code>go tool pprof http://localhost:8001/debug/pprof/trace?seconds=5</code>Mutex holder stack trace:
<code>go tool pprof http://localhost:8001/debug/pprof/mutex</code>UI Interface
Export a profile file then serve it with
go toolfor graphical analysis.
Example for kube‑scheduler:
<code>curl -sK -v http://localhost:10251/debug/pprof/heap > heap.out</code> <code>go tool pprof -http=0.0.0.0:8989 heap.out</code>The UI provides menus such as VIEW (Top, Graph, Flame Graph, Peek, Source, Disassemble), SAMPLE (alloc_objects, alloc_space, inuse_objects, inuse_space), and REFINE for filtering.
Note: Some Kubernetes versions disable pprof by default; enable it with
profiling: truein the component’s configuration.
Analyzing Specific Components
kube‑apiserver
<code>kubectl proxy</code> <code>curl -sK -v http://localhost:8001/debug/pprof/profile > apiserver-cpu.out</code> <code>go tool pprof -http=0.0.0.0:8989 apiserver-cpu.out</code>kube‑scheduler
<code>curl -sK -v http://localhost:10251/debug/pprof/profile > scheduler-cpu.out</code> <code>go tool pprof -http=0.0.0:8989 scheduler-cpu.out</code>kube‑controller‑manager
<code>curl -sK -v http://localhost:10252/debug/pprof/profile > controller-cpu.out</code> <code>go tool pprof -http=0.0.0.0:8989 controller-cpu.out</code>kubelet
<code>kubectl proxy</code> <code>curl -sK -v http://127.0.0.1:8001/api/v1/nodes/k8s-node04-138/proxy/debug/pprof/profile > kubelet-cpu.out</code> <code>go tool pprof -http=0.0.0.0:8989 kubelet-cpu.out</code>Capturing performance data is the first step; subsequent analysis helps pinpoint the root cause.
References
https://github.com/google/pprof
https://github.com/uber-archive/go-torch
http://www.graphviz.org/download/#linux
https://kubernetes.io/zh/docs/reference/command-line-tools-reference/kube-apiserver/
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.