How to Auto‑Recover Lost s3fs Mounts in a Huawei OBS CSI Plugin
This article explains why a Huawei OBS CSI plugin loses its s3fs process after a restart, causing "Transport endpoint is not connected" errors, and provides a step‑by‑step solution using client‑go to rebuild the mount and trigger kubelet remount via a liveness probe.
Problem Description
The Huawei OBS CSI plugin mounts the host /var/lib/kubelet/pods directory into business Pods. When the CSI plugin restarts, the s3fs process that connects to the S3 service is lost, causing the mounted path inside the Pod to return Transport endpoint is not connected. The usual workaround of restarting the business Pod is inelegant.
Solution Idea
To fix the error, the s3fs process must be restored. Restoration requires PVC name, Pod UID, S3 endpoint, and AK/SK. Two approaches exist: store metadata in S3 (risking loss or inconsistency) or dynamically retrieve data via client-go. The second approach is chosen.
Retrieve all PVCs across namespaces ( allPvcs).
Filter PVCs whose
metadata.annotations.volume.beta.kubernetes.io/storage-provisionermatches the target StorageClass, producing targetPvcs.
Find Pods that have mounted targetPvcs, yielding targetPods.
Obtain the UID of each targetPods ( targetUid).
Construct the host mount path:
/var/lib/kubelet/pods/<targetUid>/volumes/<targetUid>kubernetes.io~csi/<targetPvc-name>/mountFrom targetPvcs get spec.storageClassName to locate the responsible StorageClass ( targetStorageclass).
Extract the secret containing the S3 AK/SK from targetStorageclass.parameters ( targetSecret).
Read AK/SK from targetSecret.
Execute the mount operation.
The goal of these steps is to discover the host mount path and the S3 access credentials.
Implementation Process
After coding, the host path was successfully mounted, but the business container did not see the files because the standard unmount/mount flow was not triggered. Since the CSI plugin restarted abnormally, the NodeUnpublishVolume call was never made, so the previous mount point must be manually umount ed before remounting.
Because the CSI plugin was restarted unexpectedly, the normal NodeUnpublishVolume path is skipped; therefore, before re‑mounting you must umount the stale mount point.
The container failed to mount because kubelet never performed the required umount / mount sequence. To force a remount, trigger kubelet’s mount action using a livenessProbe. The following Pod manifest demonstrates this:
apiVersion: v1
kind: Pod
metadata:
name: csi-s3-test-nginx
namespace: default
spec:
containers:
- name: csi-s3-test-nginx
image: nginx
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
exec:
command:
- ls
- /var/lib/www/html
volumeMounts:
- mountPath: /var/lib/www/html
name: webroot
- mountPath: /var/lib/www/html2
name: webroot2
volumes:
- name: webroot
persistentVolumeClaim:
claimName: csi-s3-pvc
readOnly: false
- name: webroot2
persistentVolumeClaim:
claimName: csi-s3-pvc2
readOnly: falseUsing this probe alone is insufficient; the s3fs process must also be restarted because a pod restart only triggers kubelet, not the CSI driver’s mount logic. Therefore both the s3fs process and the livenessProbe are required to recover the mount.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
