Cloud Native 14 min read

Accelerate AI and Big Data Workloads on Kubernetes with Fluid’s JindoRuntime

Fluid is an open‑source Kubernetes‑native engine that orchestrates and accelerates distributed datasets for AI and big‑data workloads, and this guide explains its core concepts, the JindoRuntime implementation, performance benefits, and step‑by‑step instructions to deploy and test JindoRuntime on a K8s cluster.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Accelerate AI and Big Data Workloads on Kubernetes with Fluid’s JindoRuntime

What is Fluid?

Fluid is an open‑source, Kubernetes‑native distributed dataset orchestration and acceleration engine for cloud‑native, data‑intensive applications such as big‑data and AI workloads. It abstracts the data layer so that data can flow between storage systems (e.g., HDFS, OSS, Ceph) and compute workloads on Kubernetes, handling caching, replication, eviction, and transformation transparently.

Core concepts: Dataset and Runtime

A Dataset is a logical collection of related data that can be consumed by engines such as Spark or TensorFlow. Managing a Dataset involves security, versioning, and acceleration concerns.

Fluid introduces a Runtime abstraction to provide these capabilities. Currently Fluid supports two runtimes: AlluxioRuntime and JindoRuntime. The runtime defines lifecycle interfaces for security, version control, and data acceleration.

Key benefits

Data‑affinity scheduling and distributed caching accelerate data access for compute.

Namespace‑based isolation provides secure multi‑tenant data management.

Cross‑storage data federation reduces data‑island effects.

JindoRuntime

JindoRuntime is built on JindoFS , a proprietary Alibaba Cloud storage‑optimization engine for OSS that is fully compatible with the Hadoop FileSystem (HDFS) API. JindoFS offers two modes:

Block mode : Stores file blocks on OSS and optionally caches them locally, using a local namespace service for metadata.

Cache mode : Keeps the original OSS directory structure while providing client‑side caching and metadata acceleration; no data migration is required.

In Fluid, JindoRuntime uses JindoFS’s cache mode to access and cache remote OSS files. It can be deployed with a single Helm chart, supports STS credential‑free access, checksum verification, and client‑side encryption.

Advantages

Performance : Optimized OSS read/write paths and native‑layer enhancements deliver high throughput, especially for small files.

Rich distributed caching : Supports multi‑TB file caches and metadata caching, showing strong results in large‑scale AI training and data‑lake scenarios.

Security : STS token‑less access, Kubernetes secret integration, and checksum‑based data integrity.

Lightweight : Implemented in C++, adding minimal overhead to OSS access.

Performance snapshot

Using the ImageNet dataset on a Kubernetes cluster with the Arena benchmark, training ResNet‑50 with JindoRuntime (cache enabled) reduced training time by 76% compared with the open‑source OSSFS driver.

Quick start: Deploying JindoRuntime

The following steps assume a functional Kubernetes cluster with access to an Alibaba Cloud OSS bucket.

Create a namespace for Fluid: kubectl create ns fluid-system Download the Fluid release package (e.g., fluid-0.5.0.tgz).

Install Fluid with Helm, enabling JindoRuntime:

helm install --set runtime.jindo.enabled=true fluid fluid-0.5.0.tgz

Verify the Fluid pods are running:

$ kubectl get pod -n fluid-system
NAME                                         READY   STATUS    RESTARTS   AGE
csi-nodeplugin-fluid-2mfcr                   2/2     Running   0          108s
csi-nodeplugin-fluid-l7lv6                   2/2     Running   0          108s
dataset-controller-5465c4bbf-5ds5p          1/1     Running   0          108s
jindoruntime-controller-654fb74447-cldsv     1/1     Running   0          108s

The number of csi-nodeplugin-fluid-xx pods should match the number of cluster nodes.

Create a Kubernetes Secret to store OSS credentials (replace xxx with your actual keys):

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: xxx
  fs.oss.accessKeySecret: xxx

Apply the secret: kubectl create -f mysecret.yaml Define a Dataset CRD and a corresponding JindoRuntime CRD (replace placeholders with your OSS bucket information):

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hadoop
spec:
  mounts:
    - mountPoint: oss://oss_bucket/bucket_dir
      options:
        fs.oss.endpoint: oss_endpoint
      name: hadoop
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: hadoop
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: HDD
        path: /mnt/disk1
        quota: 100Gi
        high: "0.99"
        low: "0.8"

Apply the resources: kubectl create -f resource.yaml Check the Dataset status to confirm caching:

$ kubectl get dataset hadoop
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
hadoop 210MiB           0B       180GiB           0.0%                Bound   1h

Create a simple application pod that mounts the Dataset to observe acceleration:

apiVersion: v1
kind: Pod
metadata:
  name: demo-app
spec:
  containers:
    - name: demo
      image: nginx
      volumeMounts:
        - mountPath: /data
          name: hadoop
  volumes:
    - name: hadoop
      persistentVolumeClaim:
        claimName: hadoop

Deploy the pod: kubectl create -f app.yaml Inside the pod, copy a 210 MiB file and measure time:

$ kubectl exec -it demo-app -- bash
$ time cp /data/hadoop/spark-3.0.1-bin-hadoop2.7.tgz /dev/null
real 0m18.386s

After the first run, the file is cached locally. Re‑run the copy after recreating the pod:

$ time cp /data/hadoop/spark-3.0.1-bin-hadoop2.7.tgz /dev/null
real 0m0.048s

The second copy is ~300× faster, demonstrating JindoRuntime’s caching effect.

Cleanup the environment:

kubectl delete jindoruntime hadoop
kubectl delete dataset hadoop
kubectl delete -f app.yaml
kubectl delete secret mysecret

Further resources

Fluid project GitHub: https://github.com/fluid-cloudnative/fluid

Fluid architecture diagram
Fluid architecture diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeBig DataAIKubernetesFluidData AccelerationJindoRuntime
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.