Cloud Native 16 min read

Master Advanced Kubernetes Troubleshooting: PVC, Events, and Dashboard Tips

This article dives into advanced Kubernetes troubleshooting techniques, covering persistent volume claim pending errors, event and audit log analysis, dashboard deployment, health probes, temporary containers, and node‑level debugging to help DevOps engineers resolve complex cluster issues efficiently.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
Master Advanced Kubernetes Troubleshooting: PVC, Events, and Dashboard Tips

Top 10 Kubernetes Troubleshooting Techniques – Part 2

In the first part we explored basic troubleshooting skills for diagnosing common cluster and application problems. This second part delves deeper into advanced strategies and real‑world scenarios.

6. Kubernetes Storage Troubleshooting: Fix PVC Pending Errors

PersistentVolumeClaim (PVC) in Pending state is a common storage issue that can prevent applications from running, often caused by misconfigured storage classes, missing provisioners, or insufficient storage space.

Step 1: Check PV and PVC Status

List all PersistentVolumes (PV) and PersistentVolumeClaims (PVC) across namespaces to get an overview of their status, access modes, capacity, and binding:

kubectl get pv,pvc --all-namespaces

Step 2: Investigate Unbound PVCs

Describe the PVC to see the events at the bottom of the output, which usually reveal the root cause, such as:

No matching PersistentVolume StorageClass typo

Insufficient capacity

Missing provisioner

kubectl describe pvc

Step 3: Verify StorageClass

List all available StorageClasses and describe the specific one to ensure it exists and is spelled correctly:

kubectl get storageclass
kubectl describe storageclass

If the StorageClass referenced in the PVC does not exist, the PVC cannot be provisioned.

Step 4: Common Errors and Fixes

When a PVC references a non‑existent class (e.g., fast-ssd), update the PVC to use a valid class such as gp2 or standard:

kubectl describe pvc my-data-pvc
Warning  ProvisioningFailed  3m    persistentvolume-controller  storageclass.storage.k8s.io "fast-ssd" not found

List storage classes to confirm the correct one is available.

7. Using Events and Audit Logs: Deep System Analysis

Kubernetes provides powerful debugging tools: events and audit logs. Events show what happened in the cluster, while audit logs capture API‑level actions, helping trace who did what.

Step 1: Understand Kubernetes Events

List all events across namespaces, sorted by creation time, to see recent activity:

kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'

Filter events after a specific time or only warnings:

kubectl get events --field-selector='lastTimestamp>2023-10-01T10:00:00Z'
kubectl get events --field-selector type=Warning

Use --watch for real‑time monitoring: kubectl get events --watch Filter events for a particular pod, deployment, or failure reason:

kubectl get events --field-selector involvedObject.name=my-pod
kubectl get events --field-selector reason=FailedScheduling

Step 2: Use Audit Logs for In‑Depth Investigation

Audit logs reveal which user performed which API request, from which IP, using which HTTP verb, and on which resource. Enable audit logging by configuring an audit policy:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["pods","services"]
  - group: "apps"
    resources: ["deployments","replicasets"]
- level: Request
  resources:
  - group: ""
    resources: ["configmaps","secrets"]

After enabling, audit logs can show details such as user, source IP, verb, object reference, response status, and timestamps.

8. Using the Kubernetes Dashboard and Visualization Tools

While kubectl provides command‑line access, the Kubernetes Dashboard offers a web UI for visualizing resources, metrics, logs, and events.

Step 1: Deploy the Dashboard

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Step 2: Create a Service Account and Bind Cluster‑Admin Role

kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin

Step 3: Generate an Access Token

kubectl create token dashboard-admin -n kubernetes-dashboard

After deployment, the dashboard lets you monitor CPU/memory usage, visualize event timelines, explore resource relationships, and view application logs directly in the browser.

9. Health Checks and Probes

Kubernetes health checks act like regular medical exams, helping detect issues early.

Understanding the Three Probes

Liveness Probe – restarts a container if it becomes unhealthy.

Readiness Probe – removes a container from service endpoints when not ready.

Startup Probe – gives a container extra time to initialize before other probes run.

Example deployment combining all three probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-application
  template:
    metadata:
      labels:
        app: web-application
    spec:
      containers:
      - name: web-app
        image: my-app:v1.2.3
        ports:
        - containerPort: 8080
        startupProbe:
          httpGet:
            path: /health/startup
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

The probes work together: the startup probe allows the container time to initialize; once successful, the liveness probe monitors ongoing health, and the readiness probe controls traffic routing.

10. Advanced Debugging Techniques

For complex performance bottlenecks or network issues, use temporary containers, node‑level debugging, and full pod copies.

Step 1: Debug with Temporary Containers

Inject a debugging container into a running pod without restarting it:

kubectl debug -it --image=busybox --target=<pod-name>

Use richer images for more tools:

kubectl debug database-pod -it --image=ubuntu --target=postgres -- bash

Network troubleshooting example:

kubectl debug web-app-7d4b8c9f-xyz -it --image=nicolaka/netshoot --target=web-app
ping database-service
nslookup database-service
telnet database-service 5432
ip addr show
ss -tuln
dig database-service.default.svc.cluster.local
tcpdump -i any port 5432
ps aux
ls -la /app/
cat /app/config.yaml

Step 2: Create a Full Debug Copy of a Pod

kubectl debug web-app-7d4b8c9f-xyz --copy-to=web-app-debug --image=ubuntu --set-image=web-app=ubuntu -- sleep 1d
kubectl exec -it web-app-debug -- bash

Step 3: Node‑Level Debugging

kubectl debug node/worker-node-1 -it --image=ubuntu
chroot /host bash
systemctl status kubelet
journalctl -u kubelet -f

Performance analysis containers (e.g., Go) can be used for CPU or memory profiling:

kubectl debug web-app-7d4b8c9f-xyz -it --image=golang:1.21 --target=web-app

These advanced techniques let you investigate issues in real production conditions without affecting the workload, reducing guesswork and downtime.

Conclusion

Effective Kubernetes troubleshooting hinges on knowing when and how to apply the right debugging method. kubectl, events, and audit logs are essential for daily debugging, but combining them with dedicated observability platforms and visual tools like the Dashboard accelerates issue resolution and keeps your clusters running smoothly.

debuggingKubernetesDashboardTroubleshootingEventsPVCProbes
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.