Cloud Native 7 min read

How to Auto‑Recover Lost s3fs Mounts in a Huawei OBS CSI Plugin

This article explains why a Huawei OBS CSI plugin loses its s3fs process after a restart, causing "Transport endpoint is not connected" errors, and provides a step‑by‑step solution using client‑go to rebuild the mount and trigger kubelet remount via a liveness probe.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Auto‑Recover Lost s3fs Mounts in a Huawei OBS CSI Plugin

Problem Description

The Huawei OBS CSI plugin mounts the host /var/lib/kubelet/pods directory into business Pods. When the CSI plugin restarts, the s3fs process that connects to the S3 service is lost, causing the mounted path inside the Pod to return Transport endpoint is not connected. The usual workaround of restarting the business Pod is inelegant.

Solution Idea

To fix the error, the s3fs process must be restored. Restoration requires PVC name, Pod UID, S3 endpoint, and AK/SK. Two approaches exist: store metadata in S3 (risking loss or inconsistency) or dynamically retrieve data via client-go. The second approach is chosen.

Retrieve all PVCs across namespaces ( allPvcs).

Filter PVCs whose

metadata.annotations.volume.beta.kubernetes.io/storage-provisioner

matches the target StorageClass, producing targetPvcs.

Find Pods that have mounted targetPvcs, yielding targetPods.

Obtain the UID of each targetPods ( targetUid).

Construct the host mount path:

/var/lib/kubelet/pods/<targetUid>/volumes/<targetUid>kubernetes.io~csi/<targetPvc-name>/mount

From targetPvcs get spec.storageClassName to locate the responsible StorageClass ( targetStorageclass).

Extract the secret containing the S3 AK/SK from targetStorageclass.parameters ( targetSecret).

Read AK/SK from targetSecret.

Execute the mount operation.

The goal of these steps is to discover the host mount path and the S3 access credentials.

Implementation Process

After coding, the host path was successfully mounted, but the business container did not see the files because the standard unmount/mount flow was not triggered. Since the CSI plugin restarted abnormally, the NodeUnpublishVolume call was never made, so the previous mount point must be manually umount ed before remounting.

Because the CSI plugin was restarted unexpectedly, the normal NodeUnpublishVolume path is skipped; therefore, before re‑mounting you must umount the stale mount point.

The container failed to mount because kubelet never performed the required umount / mount sequence. To force a remount, trigger kubelet’s mount action using a livenessProbe. The following Pod manifest demonstrates this:

apiVersion: v1
kind: Pod
metadata:
  name: csi-s3-test-nginx
  namespace: default
spec:
  containers:
  - name: csi-s3-test-nginx
    image: nginx
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 20
      periodSeconds: 5
      timeoutSeconds: 5
      exec:
        command:
        - ls
        - /var/lib/www/html
    volumeMounts:
    - mountPath: /var/lib/www/html
      name: webroot
    - mountPath: /var/lib/www/html2
      name: webroot2
  volumes:
  - name: webroot
    persistentVolumeClaim:
      claimName: csi-s3-pvc
      readOnly: false
  - name: webroot2
    persistentVolumeClaim:
      claimName: csi-s3-pvc2
      readOnly: false

Using this probe alone is insufficient; the s3fs process must also be restarted because a pod restart only triggers kubelet, not the CSI driver’s mount logic. Therefore both the s3fs process and the livenessProbe are required to recover the mount.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CloudNativeKubernetesstorageCSILivenessProbes3fs
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.