Cloud Native 9 min read

Mastering Kubernetes Component Troubleshooting with pprof and Log Analysis

Learn a systematic approach to diagnosing Kubernetes core component issues by identifying faulty nodes, analyzing logs via systemd or static pods, and leveraging Go's pprof tool for performance profiling, including step‑by‑step commands and UI visualizations for components like kube‑apiserver, scheduler, controller‑manager, and kubelet.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering Kubernetes Component Troubleshooting with pprof and Log Analysis

Kubernetes's core components are like a house foundation; their importance is obvious. As a cluster maintainer, you often encounter component issues. This article outlines a concise troubleshooting workflow.

Identify faulty nodes or components via cluster status.

Analyze component logs.

Use pprof for performance analysis.

Define Scope

The set of core components is small and simple to deploy. For example, when running

kubectl get nodes

, a node showing

NotReady

suggests either a kubelet problem or a network issue, guiding the initial direction for elimination.

We adopt a “hypothesize then verify” method, listing possible factors and checking them one by one until the issue is resolved.

Log Analysis

Log inspection is the most direct way to troubleshoot. Component logs can be viewed in two ways:

For services started by systemd:

journalctl -l -u <service>

For static pod services:

kubectl logs -n kube-system $PODNAME --tail 100

Additionally, monitor surrounding infrastructure metrics such as CPU, memory, and I/O.

Performance Analysis

Performance profiling is placed last because it requires time and knowledge of metrics. Kubernetes releases frequently, and bugs or performance regressions may appear. Go’s pprof tool and go‑torch can generate flame graphs for deeper insight.

All components expose pprof endpoints, e.g.,

host:port/debug/pprof/

.

Common pprof Commands

Interactive

View stack traces:

<code>go tool pprof http://localhost:8001/debug/pprof/heap</code>

Collect 30‑second CPU profile:

<code>go tool pprof http://localhost:8001/debug/pprof/profile?seconds=30</code>

Show goroutine blocking:

<code>go tool pprof http://localhost:8001/debug/pprof/block</code>

Collect 5‑second execution trace:

<code>go tool pprof http://localhost:8001/debug/pprof/trace?seconds=5</code>

Mutex holder stack trace:

<code>go tool pprof http://localhost:8001/debug/pprof/mutex</code>

UI Interface

Export a profile file then serve it with

go tool

for graphical analysis.

Example for kube‑scheduler:

<code>curl -sK -v http://localhost:10251/debug/pprof/heap > heap.out</code>
<code>go tool pprof -http=0.0.0.0:8989 heap.out</code>

The UI provides menus such as VIEW (Top, Graph, Flame Graph, Peek, Source, Disassemble), SAMPLE (alloc_objects, alloc_space, inuse_objects, inuse_space), and REFINE for filtering.

Note: Some Kubernetes versions disable pprof by default; enable it with

profiling: true

in the component’s configuration.

Analyzing Specific Components

kube‑apiserver

<code>kubectl proxy</code>
<code>curl -sK -v http://localhost:8001/debug/pprof/profile > apiserver-cpu.out</code>
<code>go tool pprof -http=0.0.0.0:8989 apiserver-cpu.out</code>

kube‑scheduler

<code>curl -sK -v http://localhost:10251/debug/pprof/profile > scheduler-cpu.out</code>
<code>go tool pprof -http=0.0.0:8989 scheduler-cpu.out</code>

kube‑controller‑manager

<code>curl -sK -v http://localhost:10252/debug/pprof/profile > controller-cpu.out</code>
<code>go tool pprof -http=0.0.0.0:8989 controller-cpu.out</code>

kubelet

<code>kubectl proxy</code>
<code>curl -sK -v http://127.0.0.1:8001/api/v1/nodes/k8s-node04-138/proxy/debug/pprof/profile > kubelet-cpu.out</code>
<code>go tool pprof -http=0.0.0.0:8989 kubelet-cpu.out</code>

Capturing performance data is the first step; subsequent analysis helps pinpoint the root cause.

References

https://github.com/google/pprof

https://github.com/uber-archive/go-torch

http://www.graphviz.org/download/#linux

https://kubernetes.io/zh/docs/reference/command-line-tools-reference/kube-apiserver/

Performancecloud-nativekubernetespproftroubleshootingLogs
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.