Master Advanced Kubernetes Troubleshooting: PVC, Events, and Dashboard Tips
This article dives into advanced Kubernetes troubleshooting techniques, covering persistent volume claim pending errors, event and audit log analysis, dashboard deployment, health probes, temporary containers, and node‑level debugging to help DevOps engineers resolve complex cluster issues efficiently.
Top 10 Kubernetes Troubleshooting Techniques – Part 2
In the first part we explored basic troubleshooting skills for diagnosing common cluster and application problems. This second part delves deeper into advanced strategies and real‑world scenarios.
6. Kubernetes Storage Troubleshooting: Fix PVC Pending Errors
PersistentVolumeClaim (PVC) in Pending state is a common storage issue that can prevent applications from running, often caused by misconfigured storage classes, missing provisioners, or insufficient storage space.
Step 1: Check PV and PVC Status
List all PersistentVolumes (PV) and PersistentVolumeClaims (PVC) across namespaces to get an overview of their status, access modes, capacity, and binding:
kubectl get pv,pvc --all-namespacesStep 2: Investigate Unbound PVCs
Describe the PVC to see the events at the bottom of the output, which usually reveal the root cause, such as:
No matching PersistentVolume StorageClass typo
Insufficient capacity
Missing provisioner
kubectl describe pvcStep 3: Verify StorageClass
List all available StorageClasses and describe the specific one to ensure it exists and is spelled correctly:
kubectl get storageclass kubectl describe storageclassIf the StorageClass referenced in the PVC does not exist, the PVC cannot be provisioned.
Step 4: Common Errors and Fixes
When a PVC references a non‑existent class (e.g., fast-ssd), update the PVC to use a valid class such as gp2 or standard:
kubectl describe pvc my-data-pvc Warning ProvisioningFailed 3m persistentvolume-controller storageclass.storage.k8s.io "fast-ssd" not foundList storage classes to confirm the correct one is available.
7. Using Events and Audit Logs: Deep System Analysis
Kubernetes provides powerful debugging tools: events and audit logs. Events show what happened in the cluster, while audit logs capture API‑level actions, helping trace who did what.
Step 1: Understand Kubernetes Events
List all events across namespaces, sorted by creation time, to see recent activity:
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'Filter events after a specific time or only warnings:
kubectl get events --field-selector='lastTimestamp>2023-10-01T10:00:00Z' kubectl get events --field-selector type=WarningUse --watch for real‑time monitoring: kubectl get events --watch Filter events for a particular pod, deployment, or failure reason:
kubectl get events --field-selector involvedObject.name=my-pod kubectl get events --field-selector reason=FailedSchedulingStep 2: Use Audit Logs for In‑Depth Investigation
Audit logs reveal which user performed which API request, from which IP, using which HTTP verb, and on which resource. Enable audit logging by configuring an audit policy:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["pods","services"]
- group: "apps"
resources: ["deployments","replicasets"]
- level: Request
resources:
- group: ""
resources: ["configmaps","secrets"]After enabling, audit logs can show details such as user, source IP, verb, object reference, response status, and timestamps.
8. Using the Kubernetes Dashboard and Visualization Tools
While kubectl provides command‑line access, the Kubernetes Dashboard offers a web UI for visualizing resources, metrics, logs, and events.
Step 1: Deploy the Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yamlStep 2: Create a Service Account and Bind Cluster‑Admin Role
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-adminStep 3: Generate an Access Token
kubectl create token dashboard-admin -n kubernetes-dashboardAfter deployment, the dashboard lets you monitor CPU/memory usage, visualize event timelines, explore resource relationships, and view application logs directly in the browser.
9. Health Checks and Probes
Kubernetes health checks act like regular medical exams, helping detect issues early.
Understanding the Three Probes
Liveness Probe – restarts a container if it becomes unhealthy.
Readiness Probe – removes a container from service endpoints when not ready.
Startup Probe – gives a container extra time to initialize before other probes run.
Example deployment combining all three probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-application
spec:
replicas: 3
selector:
matchLabels:
app: web-application
template:
metadata:
labels:
app: web-application
spec:
containers:
- name: web-app
image: my-app:v1.2.3
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"The probes work together: the startup probe allows the container time to initialize; once successful, the liveness probe monitors ongoing health, and the readiness probe controls traffic routing.
10. Advanced Debugging Techniques
For complex performance bottlenecks or network issues, use temporary containers, node‑level debugging, and full pod copies.
Step 1: Debug with Temporary Containers
Inject a debugging container into a running pod without restarting it:
kubectl debug -it --image=busybox --target=<pod-name>Use richer images for more tools:
kubectl debug database-pod -it --image=ubuntu --target=postgres -- bashNetwork troubleshooting example:
kubectl debug web-app-7d4b8c9f-xyz -it --image=nicolaka/netshoot --target=web-app ping database-service
nslookup database-service
telnet database-service 5432
ip addr show
ss -tuln
dig database-service.default.svc.cluster.local
tcpdump -i any port 5432
ps aux
ls -la /app/
cat /app/config.yamlStep 2: Create a Full Debug Copy of a Pod
kubectl debug web-app-7d4b8c9f-xyz --copy-to=web-app-debug --image=ubuntu --set-image=web-app=ubuntu -- sleep 1d kubectl exec -it web-app-debug -- bashStep 3: Node‑Level Debugging
kubectl debug node/worker-node-1 -it --image=ubuntu chroot /host bash
systemctl status kubelet
journalctl -u kubelet -fPerformance analysis containers (e.g., Go) can be used for CPU or memory profiling:
kubectl debug web-app-7d4b8c9f-xyz -it --image=golang:1.21 --target=web-appThese advanced techniques let you investigate issues in real production conditions without affecting the workload, reducing guesswork and downtime.
Conclusion
Effective Kubernetes troubleshooting hinges on knowing when and how to apply the right debugging method. kubectl, events, and audit logs are essential for daily debugging, but combining them with dedicated observability platforms and visual tools like the Dashboard accelerates issue resolution and keeps your clusters running smoothly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
