Cloud Native 16 min read

Fix Stuck Kubernetes Resources, ETCD Errors, and ServiceAccount Issues

This guide walks through troubleshooting common Kubernetes issues such as deleting stuck RCs, Deployments, and Services, resetting etcd after failures, fixing apiserver start errors caused by missing ServiceAccount certificates, handling SELinux permission denials, configuring host trust, and force‑deleting problematic Pods or Namespaces.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Fix Stuck Kubernetes Resources, ETCD Errors, and ServiceAccount Issues

How to delete resources in inconsistent state

When kubectl hangs and only part of a resource is removed, you can force delete the remaining RC, Deployment, or Service:

kubectl delete deployment kibana-logging -n kube-system --cascade=false

kubectl delete deployment kibana-logging -n kube-system --ignore-not-found

delete rc elasticsearch-logging-v1 -n kube-system --force --now --grace-period=0

Resetting etcd after deletion failures

Remove all data under /var/lib/etcd/* and reboot the master node, then recreate the network configuration:

rm -rf /var/lib/etcd/*

etcdctl mk /atomic.io/network/config '{ "Network": "192.168.0.0/16" }'

Apiserver start failure due to missing ServiceAccount files

The error “start request repeated too quickly for kube-apiserver.service” often masks a missing ca.crt file when ServiceAccount is enabled. Check /var/run/kubernetes/ca.crt and ensure the certificate files are present.

Permission denied caused by SELinux

Fluentd may fail to write /var/log/fluentd.log if SELinux is enforcing. Disable it by editing /etc/selinux/config (set SELINUX=disabled) and reboot.

Generating ServiceAccount certificates

Create a CA and server certificates, then start the API server manually with the generated files:

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=k8s-master" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048
echo subjectAltName=IP:10.254.0.1 > extfile.cnf
openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000

Start the API server with the appropriate flags, for example:

/usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://k8s-master:2379 --address=0.0.0.0 --port=8080 --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount --client-ca-file=/root/keys/ca.crt --tls-cert-file=/root/keys/server.crt --tls-private-key-file=/root/keys/server.key --secure-port=443

ETCD startup failures

If etcd fails with “raft save state and entries error: open …/wal/0.tmp: is a directory”, delete the 0.tmp file in the WAL directory and restart.

For nodes that do not start after a power loss, backup the data directory, clear the member directory, stop the other etcd nodes, and restart each node sequentially.

Host trust configuration on CentOS

Generate SSH keys with ssh-keygen -t rsa and distribute the public key using ssh-copy-id to enable password‑less login between hosts.

Changing hostname on CentOS

hostnamectl set-hostname k8s-master1

Enabling copy‑paste in VirtualBox guest

Install kernel headers and build tools, then run the Guest Additions installer:

yum install update
yum update kernel
yum update kernel-devel
yum install kernel-headers
yum install gcc gcc make
sh VBoxLinuxAdditions.run

Force‑deleting Pods or Namespaces stuck in Terminating

kubectl delete pod NAME --grace-period=0 --force
# delete-ns.sh
#!/bin/bash
set -e
usage(){ echo "usage: delns.sh NAMESPACE"; }
if [ $# -lt 1 ]; then usage; exit 1; fi
NAMESPACE=$1
JSONFILE=${NAMESPACE}.json
kubectl get ns "${NAMESPACE}" -o json > "${JSONFILE}"
vi "${JSONFILE}"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @"${JSONFILE}" http://127.0.0.1:8001/api/v1/namespaces/"${NAMESPACE}"/finalize

Impact of containers with only resource requests

Containers that define requests but no limits can be evicted by other pods under resource pressure. Use a LimitRange to enforce default limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetestroubleshootingServiceAccountetcdkubectl
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.