Accelerating Hybrid Cloud Data Access with ACK Fluid: A Step‑by‑Step Guide
This guide explains how to use ACK Fluid to accelerate third‑party storage access for Kubernetes workloads in hybrid‑cloud scenarios, covering performance challenges, configuration steps, dataset and JindoRuntime creation, cache pre‑warming, and verification of fast data reads.
Problem Statement
When cloud‑based workloads read data stored on‑premises, they encounter limited bandwidth, high latency, costly network traffic, storage concurrency bottlenecks, network instability, and strict data‑security requirements (metadata must not be persisted to cloud disks).
Limited bandwidth & high latency cause long compute times and low resource utilization.
Redundant reads & expensive network fees arise because the native Kubernetes scheduler cannot see cached data.
On‑premises distributed storage becomes a concurrency bottleneck under heavy AI training I/O.
Network instability may lead to data‑sync errors and application downtime.
Data‑security demands that neither data nor metadata be written to cloud disks.
ACK Fluid Capabilities
Zero‑adaptation cost : any CSI‑compatible storage can be used without code changes.
Performance boost : policy‑driven caching and data pre‑heating deliver cloud‑level access speeds.
Elastic bandwidth : supports hundreds of Gbps and can scale to zero for cost efficiency.
Cache‑aware scheduling reduces cross‑network latency.
Hot‑data de‑duplication lowers network traffic by keeping frequently accessed data in the cloud cache.
Automated operations : cache warm‑up, scaling, and cleanup are managed automatically.
In‑memory metadata cache avoids persisting metadata to disks, enhancing security.
Prerequisites
ACK Pro cluster (Kubernetes v1.18+) is provisioned.
Cloud Native AI Suite is installed and the ack-fluid component is deployed (remove any open‑source Fluid installation first). kubectl is configured to access the ACK cluster.
Relevant PersistentVolume (PV) and PersistentVolumeClaim (PVC) for the target storage are created; for hybrid‑cloud use, set the access mode to read‑only for safety.
Inspect Existing PV and PVC
List PVCs and PVs: $ kubectl get pvc,pv Typical output shows a PVC demo-pvc bound to a PV demo-pv of 30 Gi with ROX access mode.
Create Dataset and JindoRuntime
Save the following manifest as dataset.yaml:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: pv-demo-dataset
spec:
mounts:
- mountPoint: pvc://demo-pvc
name: data
path: /
accessModes:
- ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: pv-demo-dataset
spec:
replicas: 2
tieredstore:
levels:
- mediumtype: MEM
path: /dev/shm
quota: 10Gi
high: "0.9"
low: "0.8"Create the resources: $ kubectl create -f dataset.yaml Verify that the Dataset reaches Bound status, indicating the JindoFS cache is running:
$ kubectl get dataset pv-demo-datasetCache Pre‑Warm with DataLoad
Save the following as dataload.yaml to warm the cache:
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: dataset-warmup
spec:
dataset:
name: pv-demo-dataset
namespace: default
loadMetadata: true
target:
- path: /
replicas: 1Apply and monitor the DataLoad:
$ kubectl create -f dataload.yaml
$ kubectl get dataload dataset-warmupWhen the PHASE shows Complete, the entire dataset is cached (CACHED = TOTAL SIZE, 100 % cached).
Deploy an Application Pod to Access Cached Data
Create pod.yaml that mounts the PVC backed by the Dataset:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
command: ["bash","-c","sleep inf"]
volumeMounts:
- mountPath: /data
name: data-vol
volumes:
- name: data-vol
persistentVolumeClaim:
claimName: pv-demo-datasetDeploy and exec into the pod:
$ kubectl create -f pod.yaml
$ kubectl exec -it nginx -- bashInside the pod, list and read the cached file:
# ls -lh /data
total 11G
-rw-r----- 1 root root 11G Jul 28 2023 demofile
# time cat /data/demofile > /dev/null
real 0m11.004s
user 0m0.065s
sys 0m3.089sBecause the data is fully cached in JindoFS, the read completes quickly without pulling data from the remote storage.
Reference URLs
Creating an ACK Pro cluster: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/create-an-ack-managed-cluster-2#task-skz-qwk-qfb
Installing Cloud Native AI Suite: https://help.aliyun.com/zh/ack/cloud-native-ai-suite/user-guide/deploy-the-cloud-native-ai-suite#task-2038811
Connecting to the cluster with kubectl: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/obtain-the-kubeconfig-file-of-a-cluster-and-use-kubectl-to-connect-to-the-cluster#task-ubf-lhg-vdb
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
