Cloud Native 12 min read

Accelerating Hybrid Cloud Data Access with ACK Fluid: A Step‑by‑Step Guide

This guide explains how to use ACK Fluid to accelerate third‑party storage access for Kubernetes workloads in hybrid‑cloud scenarios, covering performance challenges, configuration steps, dataset and JindoRuntime creation, cache pre‑warming, and verification of fast data reads.

Alibaba Cloud Native

Sep 10, 2023

Accelerating Hybrid Cloud Data Access with ACK Fluid: A Step‑by‑Step Guide

Problem Statement

When cloud‑based workloads read data stored on‑premises, they encounter limited bandwidth, high latency, costly network traffic, storage concurrency bottlenecks, network instability, and strict data‑security requirements (metadata must not be persisted to cloud disks).

Limited bandwidth & high latency cause long compute times and low resource utilization.

Redundant reads & expensive network fees arise because the native Kubernetes scheduler cannot see cached data.

On‑premises distributed storage becomes a concurrency bottleneck under heavy AI training I/O.

Network instability may lead to data‑sync errors and application downtime.

Data‑security demands that neither data nor metadata be written to cloud disks.

ACK Fluid Capabilities

Zero‑adaptation cost : any CSI‑compatible storage can be used without code changes.

Performance boost : policy‑driven caching and data pre‑heating deliver cloud‑level access speeds.

Elastic bandwidth : supports hundreds of Gbps and can scale to zero for cost efficiency.

Cache‑aware scheduling reduces cross‑network latency.

Hot‑data de‑duplication lowers network traffic by keeping frequently accessed data in the cloud cache.

Automated operations : cache warm‑up, scaling, and cleanup are managed automatically.

In‑memory metadata cache avoids persisting metadata to disks, enhancing security.

Prerequisites

ACK Pro cluster (Kubernetes v1.18+) is provisioned.

Cloud Native AI Suite is installed and the ack-fluid component is deployed (remove any open‑source Fluid installation first). kubectl is configured to access the ACK cluster.

Relevant PersistentVolume (PV) and PersistentVolumeClaim (PVC) for the target storage are created; for hybrid‑cloud use, set the access mode to read‑only for safety.

Inspect Existing PV and PVC

List PVCs and PVs: $ kubectl get pvc,pv Typical output shows a PVC demo-pvc bound to a PV demo-pv of 30 Gi with ROX access mode.

Create Dataset and JindoRuntime

Save the following manifest as dataset.yaml:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: pv-demo-dataset
spec:
  mounts:
    - mountPoint: pvc://demo-pvc
      name: data
      path: /
  accessModes:
    - ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: pv-demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10Gi
        high: "0.9"
        low: "0.8"

Create the resources: $ kubectl create -f dataset.yaml Verify that the Dataset reaches Bound status, indicating the JindoFS cache is running:

$ kubectl get dataset pv-demo-dataset

Cache Pre‑Warm with DataLoad

Save the following as dataload.yaml to warm the cache:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: dataset-warmup
spec:
  dataset:
    name: pv-demo-dataset
    namespace: default
  loadMetadata: true
  target:
    - path: /
      replicas: 1

Apply and monitor the DataLoad:

$ kubectl create -f dataload.yaml
$ kubectl get dataload dataset-warmup

When the PHASE shows Complete, the entire dataset is cached (CACHED = TOTAL SIZE, 100 % cached).

Deploy an Application Pod to Access Cached Data

Create pod.yaml that mounts the PVC backed by the Dataset:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - name: nginx
      image: nginx
      command: ["bash","-c","sleep inf"]
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: pv-demo-dataset

Deploy and exec into the pod:

$ kubectl create -f pod.yaml
$ kubectl exec -it nginx -- bash

Inside the pod, list and read the cached file:

# ls -lh /data
 total 11G
 -rw-r----- 1 root root 11G Jul 28 2023 demofile
# time cat /data/demofile > /dev/null
real    0m11.004s
user    0m0.065s
sys     0m3.089s

Because the data is fully cached in JindoFS, the read completes quickly without pulling data from the remote storage.

Reference URLs

Creating an ACK Pro cluster: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/create-an-ack-managed-cluster-2#task-skz-qwk-qfb

Installing Cloud Native AI Suite: https://help.aliyun.com/zh/ack/cloud-native-ai-suite/user-guide/deploy-the-cloud-native-ai-suite#task-2038811

Connecting to the cluster with kubectl: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/obtain-the-kubeconfig-file-of-a-cluster-and-use-kubectl-to-connect-to-the-cluster#task-ubf-lhg-vdb

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Hybrid Cloud PVC ACK Fluid Data Acceleration JindoRuntime

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.