19 Common Kubernetes Failures and How to Fix Them
This guide walks through nineteen typical Kubernetes problems—from service access and port‑mapping errors to pod init failures, PVC issues, and helm installation glitches—explaining each root cause, providing concise troubleshooting steps, and showing the exact kubectl commands and code snippets needed to resolve them.
1. Service Access Failure (Certificate Issue)
Cause: The cluster cannot recognize the TLS certificate because it is custom, expired, or otherwise invalid.
Solution: Replace or renew the certificate.
2. Service Access Failure (Connection Refused)
Cause: Incorrect port mapping; the service is running but the port is not exposed correctly.
Solution: Delete the existing Service and recreate it with the proper port mapping.
kubectl delete svc nginx-deployment3. Service Exposure Failure (AlreadyExists)
Cause: The service has already been created for the container.
Solution: Delete the existing Service and recreate it with the correct configuration.
kubectl delete svc nginx-deployment4. External Access Failure (ClusterIP)
Cause: The Service type is ClusterIP, which does not expose the service outside the cluster.
Solution: Change the Service type to NodePort so it can be accessed via any node IP.
kubectl edit svc nginx-deployment5. Pod ErrImagePull
Cause: The container image cannot be pulled (e.g., missing, private, or corrupted).
Solution: Replace the image reference with a valid one and redeploy.
kubectl set image deployment/myapp myapp-container=new-image:tag6. Init Container Stuck (PodInitializing)
Cause: The init container never finishes, often because of DNS resolution failures or missing resources.
Solution: Create the required Service (e.g., myservice) and ensure CoreDNS can resolve its name.
kubectl apply -f myservice.yaml7. CrashLoopBackOff
Cause: The container image is faulty, causing repeated start‑up failures.
Solution: Replace the problematic image with a stable version.
kubectl set image deployment/myapp myapp-container=stable-image:tag8. Pod Pending (Image Pull Issues)
Cause: The image name is incorrect or the registry cannot be reached.
Solution: Correct the image name and pull the image manually if needed.
kubectl delete pod readiness-httpget-pod
kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine9. Pod Ready State Not Reached
Cause: The pod’s command fails, preventing the container from becoming ready.
Solution: Enter the container, fix the command or resource definition, and redeploy.
10. Pod Creation Failure (YAML Errors)
Cause: The YAML file contains duplicate fields or uses non‑ASCII characters.
Solution: Clean up the YAML (remove duplicate containers entries, fix encoding) and apply again.
11. Flannel DaemonSet Pod Init Failure
Cause: Image pull failure on the node (e.g., network or registry issue).
Solution: Restart Docker on the affected node, manually pull the image, or reinstall the Flannel plugin.
kubectl create -f kube-flannel.yml
kubectl get nodes12. Service ErrImagePull
Cause: Incorrect image name specified in the pod spec.
Solution: Delete the faulty pod and redeploy with the correct image.
kubectl delete pod test-nginx
kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine13. Cannot Enter Specified Container
Cause: The YAML defines duplicate containers fields, so the intended container does not exist.
Solution: Remove the extra containers entry and recreate the pod.
14. PersistentVolume Creation Failure
Cause: Duplicate name fields in PV definitions.
Solution: Change the PV name to a unique value.
15. PVC Mount Failure
Cause: Mismatch between accessModes of the PVC and the available PVs (e.g., PVC requests RWO >1Gi but only a different PV exists).
Solution: Adjust the PVC or PV accessModes to match.
16. PV Content Not Accessible
Cause: The NFS volume is empty or permissions are incorrect.
Solution: Create the required files in the NFS share and set appropriate permissions.
17. Node Status Query Failure
Cause: Missing Heapster monitoring service.
Solution: Install Prometheus (or another monitoring stack) to provide node metrics.
18. Pod Stuck in Pending
Cause: No schedulable nodes available because existing pods already consume all resources for the requested image.
Solution: Delete existing pods or scale down workloads, then redeploy.
19. Helm Install Failure
Cause: Incorrect chart file naming; Chart.yaml is missing or misnamed.
Solution: Rename the file to Chart.yaml and retry the install.
mv chart.yaml Chart.yamlSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
