Cloud Native 9 min read

Mastering Kubernetes Component Troubleshooting with pprof and Log Analysis

Learn a systematic approach to diagnosing Kubernetes core component issues by identifying faulty nodes, analyzing logs via systemd or static pods, and leveraging Go's pprof tool for performance profiling, including step‑by‑step commands and UI visualizations for components like kube‑apiserver, scheduler, controller‑manager, and kubelet.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering Kubernetes Component Troubleshooting with pprof and Log Analysis

Kubernetes's core components are like a house foundation; their importance is obvious. As a cluster maintainer, you often encounter component issues. This article outlines a concise troubleshooting workflow.

Identify faulty nodes or components via cluster status.

Analyze component logs.

Use pprof for performance analysis.

Define Scope

The set of core components is small and simple to deploy. For example, when running kubectl get nodes, a node showing NotReady suggests either a kubelet problem or a network issue, guiding the initial direction for elimination.

We adopt a “hypothesize then verify” method, listing possible factors and checking them one by one until the issue is resolved.

Log Analysis

Log inspection is the most direct way to troubleshoot. Component logs can be viewed in two ways:

For services started by systemd: journalctl -l -u <service> For static pod services: kubectl logs -n kube-system $PODNAME --tail 100 Additionally, monitor surrounding infrastructure metrics such as CPU, memory, and I/O.

Performance Analysis

Performance profiling is placed last because it requires time and knowledge of metrics. Kubernetes releases frequently, and bugs or performance regressions may appear. Go’s pprof tool and go‑torch can generate flame graphs for deeper insight.

All components expose pprof endpoints, e.g., host:port/debug/pprof/.

Common pprof Commands

Interactive

View stack traces:

go tool pprof http://localhost:8001/debug/pprof/heap

Collect 30‑second CPU profile:

go tool pprof http://localhost:8001/debug/pprof/profile?seconds=30

Show goroutine blocking:

go tool pprof http://localhost:8001/debug/pprof/block

Collect 5‑second execution trace:

go tool pprof http://localhost:8001/debug/pprof/trace?seconds=5

Mutex holder stack trace:

go tool pprof http://localhost:8001/debug/pprof/mutex

UI Interface

Export a profile file then serve it with go tool for graphical analysis.

Example for kube‑scheduler:

curl -sK -v http://localhost:10251/debug/pprof/heap > heap.out
go tool pprof -http=0.0.0.0:8989 heap.out

The UI provides menus such as VIEW (Top, Graph, Flame Graph, Peek, Source, Disassemble), SAMPLE (alloc_objects, alloc_space, inuse_objects, inuse_space), and REFINE for filtering.

Note: Some Kubernetes versions disable pprof by default; enable it with profiling: true in the component’s configuration.

Analyzing Specific Components

kube‑apiserver

kubectl proxy
curl -sK -v http://localhost:8001/debug/pprof/profile > apiserver-cpu.out
go tool pprof -http=0.0.0.0:8989 apiserver-cpu.out

kube‑scheduler

curl -sK -v http://localhost:10251/debug/pprof/profile > scheduler-cpu.out
go tool pprof -http=0.0.0:8989 scheduler-cpu.out

kube‑controller‑manager

curl -sK -v http://localhost:10252/debug/pprof/profile > controller-cpu.out
go tool pprof -http=0.0.0.0:8989 controller-cpu.out

kubelet

kubectl proxy
curl -sK -v http://127.0.0.1:8001/api/v1/nodes/k8s-node04-138/proxy/debug/pprof/profile > kubelet-cpu.out
go tool pprof -http=0.0.0.0:8989 kubelet-cpu.out

Capturing performance data is the first step; subsequent analysis helps pinpoint the root cause.

References

https://github.com/google/pprof

https://github.com/uber-archive/go-torch

http://www.graphviz.org/download/#linux

https://kubernetes.io/zh/docs/reference/command-line-tools-reference/kube-apiserver/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceCloud NativeKubernetespproflogs
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.