Operations 17 min read

How to Resolve Stuck Kubernetes Resources, Reset etcd, and Fix API Server Errors

This guide explains how to delete inconsistent Kubernetes rc, deployment, and service objects, reset etcd data, address apiserver start failures caused by missing ServiceAccount certificates, disable SELinux for fluentd logs, generate ServiceAccount keys, recover from etcd startup errors, configure host trust, change hostnames, enable VirtualBox copy‑paste, force‑delete pods and namespaces, and avoid resource‑request‑only containers causing contention.

Liangxu Linux

Jan 14, 2021

How to Resolve Stuck Kubernetes Resources, Reset etcd, and Fix API Server Errors

Force‑deleting inconsistent Kubernetes objects

When kubectl hangs and kubectl get shows partially deleted resources, use the following commands to delete the objects without waiting for graceful termination:

kubectl delete deployment kibana-logging -n kube-system --cascade=false
kubectl delete deployment kibana-logging -n kube-system --ignore-not-found
kubectl delete rc elasticsearch-logging-v1 -n kube-system --force --grace-period=0

Resetting etcd after deletion failures

To wipe all etcd data and start with a clean state:

rm -rf /var/lib/etcd/*
reboot

After the node reboots, recreate the network configuration used by the cluster:

etcdctl mk /atomic.io/network/config '{ "Network": "192.168.0.0/16" }'

Fixing kube‑apiserver startup failures

The error

start request repeated too quickly for kube-apiserver.service

is often caused by missing ServiceAccount CA files. Start the API server manually with explicit certificate paths:

/usr/bin/kube-apiserver \
  --logtostderr=true --v=0 \
  --etcd-servers=http://k8s-master:2379 \
  --address=0.0.0.0 --port=8080 \
  --service-cluster-ip-range=10.254.0.0/16 \
  --admission-control=ServiceAccount \
  --client-ca-file=/root/keys/ca.crt \
  --tls-cert-file=/root/keys/server.crt \
  --tls-private-key-file=/root/keys/server.key \
  --basic-auth-file=/root/keys/basic_auth.csv \
  --secure-port=443 >> /var/log/kubernetes/kube-apiserver.log &

Similarly, start the controller‑manager manually:

/usr/bin/kube-controller-manager \
  --logtostderr=true --v=0 \
  --master=http://k8s-master:8080 \
  --root-ca-file=/root/keys/ca.crt \
  --service-account-private-key-file=/root/keys/server.key >> /var/log/kubernetes/kube-controller-manage.log &

Resolving SELinux‑related permission errors for Fluentd

Fluentd may fail to create /var/log/fluentd.log when SELinux is enforcing. Disable SELinux and reboot:

sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
reboot

Generating ServiceAccount certificates

Create a CA and server certificate pair required for ServiceAccount authentication:

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=k8s-master" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048
echo subjectAltName=IP:10.254.0.1 > extfile.cnf
openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
  -extfile extfile.cnf -out server.crt -days 10000

etcd startup failure – case 1 (wal directory)

If the log contains

raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory

, remove the stray directory and restart etcd:

rm -rf /var/lib/etcd/default.etcd/member/wal/0.tmp
systemctl restart etcd

etcd startup failure – case 2 (cluster timeout after power loss)

Synchronize the system clock, back up existing data, clear the data directory, and restart the nodes sequentially:

# Backup existing data
cp -a /var/lib/etcd/default.etcd/member/* /data/bak/
# Remove corrupted data
rm -rf /var/lib/etcd/default.etcd/member/*
# Restart each node
systemctl stop etcd
systemctl restart etcd

Configuring host trust (SSH key exchange)

Generate an RSA key pair on each host and copy the public key to the other hosts:

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@HOST_IP [-p PORT]

Changing the hostname on CentOS

hostnamectl set-hostname k8s-master1

Enabling copy‑paste in VirtualBox for CentOS

Install kernel headers and Guest Additions, then run the installer:

yum install -y kernel kernel-devel gcc make
sh VBoxLinuxAdditions.run

Force‑deleting pods and namespaces stuck in Terminating

Delete a pod immediately:

kubectl delete pod POD_NAME --grace-period=0 --force

Delete a namespace by removing its finalizer:

# delete-ns.sh
#!/bin/bash
set -e
if [ $# -lt 1 ]; then echo "usage: $0 NAMESPACE"; exit 1; fi
NS=$1
kubectl get ns "$NS" -o json > "${NS}.json"
# Edit ${NS}.json to delete the "finalizers" field
curl -k -H "Content-Type: application/json" -X PUT --data-binary @"${NS}.json" \
  http://127.0.0.1:8001/api/v1/namespaces/"${NS}"/finalize

Impact of containers with only resource requests (no limits)

Pods that specify resources.requests but omit limits can be evicted when the node is under pressure, potentially causing application failure. Apply a LimitRange policy to enforce default limits for such pods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations cluster troubleshooting Etcd kubectl

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.