Cloud Native 17 min read

Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures

This guide walks through troubleshooting Kubernetes issues such as partially deleted resources, resetting etcd, apiserver start failures due to missing ServiceAccount certificates, SELinux permission errors, ServiceAccount key generation, etcd startup errors, host trust configuration, and resource limit pitfalls, providing concrete commands and scripts for each problem.

Open Source Linux

Feb 20, 2021

Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures

How to Delete Inconsistent rc, Deployment, Service

Sometimes kubectl hangs and a kubectl get shows resources only partially deleted.

[root@k8s-master ~]# kubectl get -f fluentd-elasticsearch/
NAME                     DESIRED CURRENT READY AGE
rc/elasticsearch-logging-v1 0      2       2     15h

NAME                     DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kibana-logging   0      1       1          1        15h
Error from server (NotFound): services "elasticsearch-logging" not found
Error from server (NotFound): daemonsets.extensions "fluentd-es-v1.22" not found
Error from server (NotFound): services "kibana-logging" not found

Delete the problematic resources with:

kubectl delete deployment kibana-logging -n kube-system --cascade=false
kubectl delete deployment kibana-logging -n kube-system --ignore-not-found
kubectl delete rc elasticsearch-logging-v1 -n kube-system --force --grace-period=0

How to Reset Etcd When Deletion Fails

rm -rf /var/lib/etcd/*

Reboot the master node, then recreate the network configuration:

etcdctl mk /atomic.io/network/config '{ "Network": "192.168.0.0/16" }'

Apiserver Startup Failure

The service repeatedly fails with “start request repeated too quickly”. The real cause is missing CA files after enabling ServiceAccount.

May 21 07:56:41 k8s-master kube-apiserver: Flag --port has been deprecated, see --insecure-port instead.
May 21 07:56:41 k8s-master kube-apiserver: Validate server run options failed: unable to load client CA file: open /var/run/kubernetes/ca.crt: no such file or directory
...

Ensure the ServiceAccount CA files are present or start the API server manually:

/usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://k8s-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount --insecure-bind-address=0.0.0.0 --client-ca-file=/root/keys/ca.crt --tls-cert-file=/root/keys/server.crt --tls-private-key-file=/root/keys/server.key --basic-auth-file=/root/keys/basic_auth.csv --secure-port=443 >> /var/log/kubernetes/kube-apiserver.log &

Permission Denied Errors

Fluentd may fail to write logs because SELinux is enforcing.

# Edit /etc/selinux/config
SELINUX=enforcing   →   SELINUX=disabled
reboot

ServiceAccount‑Based Configuration

Generate the required certificates and keys:

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=k8s-master" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048

echo subjectAltName=IP:10.254.0.1 > extfile.cnf
openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000

If the apiserver configuration points to missing CA files, you will see:

Validate server run options failed: unable to load client CA file: open /root/keys/ca.crt: permission denied

Start the controller‑manager manually as needed:

/usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://k8s-master:8080 --root-ca-file=/root/keys/ca.crt --service-account-private-key-file=/root/keys/server.key & >> /var/log/kubernetes/kube-controller-manage.log

Etcd Won’t Start – Issue (1)

Log shows the raft error:

raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory

Delete the stray 0.tmp file in the WAL directory and restart etcd.

Etcd Won’t Start – Timeout Issue (2)

After a power loss, one etcd node fails to start. The fix is:

Backup the data directory: cd /var/lib/etcd/default.etcd/member && cp * /data/bak/ Remove all files in the member directory: rm -rf /var/lib/etcd/default.etcd/member/* Stop the other two etcd nodes, then restart all nodes:

# master node
systemctl stop etcd
systemctl restart etcd
# node1
systemctl stop etcd
systemctl restart etcd
# node2
systemctl stop etcd
systemctl restart etcd

Configure Host Trust on CentOS

Generate SSH keys and distribute them:

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected] (-p 2222)

Change CentOS Hostname

hostnamectl set-hostname k8s-master1

Enable Copy‑Paste in VirtualBox Guest

yum install update
yum update kernel
yum update kernel-devel
yum install kernel-headers
yum install gcc
yum install gcc make
sh VBoxLinuxAdditions.run

Force‑Delete a Stuck Pod

kubectl delete pod NAME --grace-period=0 --force

Force‑Delete a Stuck Namespace

# delete-ns.sh
#!/bin/bash
set -e
usage(){
  echo "usage:"
  echo "  delns.sh NAMESPACE"
}
if [ $# -lt 1 ]; then
  usage
  exit
fi
NAMESPACE=$1
JSONFILE=${NAMESPACE}.json
kubectl get ns "${NAMESPACE}" -o json > "${JSONFILE}"
vi "${JSONFILE}"
curl -k -H "Content-Type: application/json" -X PUT --data-binary @"${JSONFILE}" \
    http://127.0.0.1:8001/api/v1/namespaces/"${NAMESPACE}"/finalize

What Happens When a Container Has Requests but No Limits?

Example pod spec:

- name: busybox-cnt02
  image: busybox
  command: ["/bin/sh"]
  args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"]
  resources:
    requests:
      memory: "100Mi"
      cpu: "100m"

Without a limits section, the container can be evicted by other pods that have limits, potentially causing application failure. Use a LimitRange policy to enforce limits automatically.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kubernetes Linux cluster management ServiceAccount Etcd kubectl

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.