Cloud Native 12 min read

Accelerate Third‑Party Storage Access on ACK Fluid with HostPath‑Based PVs

This guide shows how to use ACK Fluid to mount third‑party storage via host‑path directories as Kubernetes PVs, enabling standard CSI integration, data isolation, and high‑performance, low‑cost access with step‑by‑step commands, YAML manifests, and performance validation.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Accelerate Third‑Party Storage Access on ACK Fluid with HostPath‑Based PVs

Overview

This guide shows how to use Alibaba Cloud Container Service for Kubernetes (ACK) Fluid to expose a host‑path directory on storage nodes as a native Kubernetes PersistentVolume (PV). The host‑path is mounted via sshfs, then Fluid’s JindoRuntime creates a distributed cache that accelerates data reads. The workflow converts a legacy host‑path mount into a CSI‑compatible PV, providing standardization, data isolation, and faster access without custom development.

Prerequisites

ACK Pro cluster (Kubernetes v1.18 or later)

Cloud Native AI Suite with the ack-fluid component installed (remove any open‑source Fluid installation first) kubectl configured to access the cluster

Three Linux nodes (e.g., 192.168.0.1‑0.3) with sshfs installed

Host‑path directories prepared on the storage nodes

1. Prepare Host‑Path Mount Points

Install sshfs on each node (CentOS example): sudo yum install sshfs -y Create a directory on the storage server (192.168.0.1) and generate a 10 GB test file:

mkdir -p /mnt/demo-remote-fs
cd /mnt/demo-remote-fs
dd if=/dev/zero of=allzero-demo count=1024 bs=10M

On the client nodes (192.168.0.2 and 192.168.0.3) mount the remote directory via sshfs:

mkdir -p /mnt/demo-remote-fs
sshfs 192.168.0.1:/mnt/demo-remote-fs /mnt/demo-remote-fs
ls /mnt/demo-remote-fs

Label the client nodes so Fluid can schedule its Master, Worker, and Fuse pods:

kubectl label node 192.168.0.2 demo-remote-fs=true
kubectl label node 192.168.0.3 demo-remote-fs=true

2. Create Fluid Dataset and JindoRuntime

Save the following as dataset.yaml. It defines a Dataset that points to the host‑path and a JindoRuntime that configures the distributed cache.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hostpath-demo-dataset
spec:
  mounts:
  - mountPoint: local:///mnt/demo-remote-fs
    name: data
    path: /
  accessModes:
  - ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: hostpath-demo-dataset
spec:
  master:
    nodeSelector:
      demo-remote-fs: "true"
  worker:
    nodeSelector:
      demo-remote-fs: "true"
  fuse:
    nodeSelector:
      demo-remote-fs: "true"
  replicas: 2
  tieredstore:
    levels:
    - mediumtype: MEM
      path: /dev/shm
      quota: 10Gi
      high: "0.99"
      low: "0.99"

Create the resources and verify that the Dataset reaches the Bound phase:

kubectl create -f dataset.yaml
kubectl get dataset hostpath-demo-dataset

3. Warm‑up the Cache with DataLoad

Pre‑populate the cache to avoid the first‑access miss. Save the following as dataload.yaml:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: dataset-warmup
spec:
  dataset:
    name: hostpath-demo-dataset
    namespace: default
  loadMetadata: true
  target:
  - path: /
    replicas: 1

Create the DataLoad object and check its status:

kubectl create -f dataload.yaml
kubectl get dataload dataset-warmup
kubectl get dataset

When the CACHED size equals the total dataset size (100 %), the cache is fully warmed.

4. Deploy a Test Pod

Define a pod that mounts the PVC created from the Dataset. Save as pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    command: ["bash", "-c", "sleep inf"]
    volumeMounts:
    - mountPath: /data
      name: data-vol
  volumes:
  - name: data-vol
    persistentVolumeClaim:
      claimName: hostpath-demo-dataset

Create the pod, then measure read performance inside the container:

kubectl create -f pod.yaml
kubectl exec -it nginx -- bash
# inside the pod
time cat /data/allzero-demo > /dev/null

Typical result: real 0m8.629s, roughly one‑eighth of the raw sshfs copy time ( ~1m5.889s), demonstrating the acceleration.

5. Cleanup

Delete the test pod when finished:

kubectl delete pod nginx

References

PV HostPath Acceleration Capability: https://help.aliyun.com/zh/ack/cloud-native-ai-suite/user-guide/accelerate-pv-storage-volume-data-access

Create ACK Pro Cluster: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/create-an-ack-managed-cluster-2#task-skz-qwk-qfb

Install Cloud Native AI Suite: https://help.aliyun.com/zh/ack/cloud-native-ai-suite/user-guide/deploy-the-cloud-native-ai-suite#task-2038811

Connect to ACK Cluster with kubectl: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/obtain-the-kubeconfig-file-of-a-cluster-and-use-kubectl-to-connect-to-the-cluster#task-ubf-lhg-vdb

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesHostPathACK FluidData AccelerationJindoRuntime
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.