Cloud Native 10 min read

19 Common Kubernetes Failures and How to Fix Them

This guide walks through nineteen typical Kubernetes problems—from service access and port‑mapping errors to pod init failures, PVC issues, and helm installation glitches—explaining each root cause, providing concise troubleshooting steps, and showing the exact kubectl commands and code snippets needed to resolve them.

dbaplus Community

Mar 14, 2022

19 Common Kubernetes Failures and How to Fix Them

1. Service Access Failure (Certificate Issue)

Cause: The cluster cannot recognize the TLS certificate because it is custom, expired, or otherwise invalid.

Solution: Replace or renew the certificate.

2. Service Access Failure (Connection Refused)

Cause: Incorrect port mapping; the service is running but the port is not exposed correctly.

Solution: Delete the existing Service and recreate it with the proper port mapping.

kubectl delete svc nginx-deployment

3. Service Exposure Failure (AlreadyExists)

Cause: The service has already been created for the container.

Solution: Delete the existing Service and recreate it with the correct configuration.

kubectl delete svc nginx-deployment

4. External Access Failure (ClusterIP)

Cause: The Service type is ClusterIP, which does not expose the service outside the cluster.

Solution: Change the Service type to NodePort so it can be accessed via any node IP.

kubectl edit svc nginx-deployment

5. Pod ErrImagePull

Cause: The container image cannot be pulled (e.g., missing, private, or corrupted).

Solution: Replace the image reference with a valid one and redeploy.

kubectl set image deployment/myapp myapp-container=new-image:tag

6. Init Container Stuck (PodInitializing)

Cause: The init container never finishes, often because of DNS resolution failures or missing resources.

Solution: Create the required Service (e.g., myservice) and ensure CoreDNS can resolve its name.

kubectl apply -f myservice.yaml

7. CrashLoopBackOff

Cause: The container image is faulty, causing repeated start‑up failures.

Solution: Replace the problematic image with a stable version.

kubectl set image deployment/myapp myapp-container=stable-image:tag

8. Pod Pending (Image Pull Issues)

Cause: The image name is incorrect or the registry cannot be reached.

Solution: Correct the image name and pull the image manually if needed.

kubectl delete pod readiness-httpget-pod
kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine

9. Pod Ready State Not Reached

Cause: The pod’s command fails, preventing the container from becoming ready.

Solution: Enter the container, fix the command or resource definition, and redeploy.

10. Pod Creation Failure (YAML Errors)

Cause: The YAML file contains duplicate fields or uses non‑ASCII characters.

Solution: Clean up the YAML (remove duplicate containers entries, fix encoding) and apply again.

11. Flannel DaemonSet Pod Init Failure

Cause: Image pull failure on the node (e.g., network or registry issue).

Solution: Restart Docker on the affected node, manually pull the image, or reinstall the Flannel plugin.

kubectl create -f kube-flannel.yml
kubectl get nodes

12. Service ErrImagePull

Cause: Incorrect image name specified in the pod spec.

Solution: Delete the faulty pod and redeploy with the correct image.

kubectl delete pod test-nginx
kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine

13. Cannot Enter Specified Container

Cause: The YAML defines duplicate containers fields, so the intended container does not exist.

Solution: Remove the extra containers entry and recreate the pod.

14. PersistentVolume Creation Failure

Cause: Duplicate name fields in PV definitions.

Solution: Change the PV name to a unique value.

15. PVC Mount Failure

Cause: Mismatch between accessModes of the PVC and the available PVs (e.g., PVC requests RWO >1Gi but only a different PV exists).

Solution: Adjust the PVC or PV accessModes to match.

16. PV Content Not Accessible

Cause: The NFS volume is empty or permissions are incorrect.

Solution: Create the required files in the NFS share and set appropriate permissions.

17. Node Status Query Failure

Cause: Missing Heapster monitoring service.

Solution: Install Prometheus (or another monitoring stack) to provide node metrics.

18. Pod Stuck in Pending

Cause: No schedulable nodes available because existing pods already consume all resources for the requested image.

Solution: Delete existing pods or scale down workloads, then redeploy.

19. Helm Install Failure

Cause: Incorrect chart file naming; Chart.yaml is missing or misnamed.

Solution: Rename the file to Chart.yaml and retry the install.

mv chart.yaml Chart.yaml

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Kubernetes Troubleshooting Containers

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.