Cloud Native 6 min read

What Is Fluid? A Cloud‑Native Data Orchestration and Acceleration Platform

Fluid is an open‑source cloud‑native data orchestration and acceleration system that runs on Kubernetes, offering storage‑agnostic datasets, distributed caching, intelligent scheduling, and performance optimizations for data‑intensive AI and big‑data workloads.

Alibaba Cloud Native

May 10, 2021

What Is Fluid? A Cloud‑Native Data Orchestration and Acceleration Platform

Fluid is an open‑source cloud‑native data orchestration and acceleration system that runs on Kubernetes. The project, hosted at https://github.com/fluid-cloudnative/fluid, was accepted as a CNCF sandbox project in April 2021. It addresses the latency and management challenges of data‑intensive workloads (big data, AI) in compute‑storage‑separated environments by providing distributed caching and intelligent scheduling.

Core Architecture

Dataset CRD : A Custom Resource Definition abstracts heterogeneous storage systems (object stores, HDFS, etc.) as a storage‑agnostic data object, enabling observability and elastic scaling.

CacheRuntime : Extends the Kubernetes API to manage distributed cache engines. Native support includes Alluxio and JindoFS.

Intelligent orchestration : Uses Kubernetes container scheduling and auto‑scaling to deploy cache instances close to the consuming pods.

Co‑scheduling : The scheduler is extended to be cache‑aware, allowing pods to be placed on nodes where the required dataset is already cached, reducing data‑access latency.

Standard access : Datasets are exposed to applications via the Persistent Volume Claim (PVC) interface, requiring no code changes in cloud‑native workloads.

Scenario‑driven tuning : Provides mechanisms for dataset pre‑warming, metadata optimization, small‑file I/O improvement, and automatic elastic scaling to boost performance for deep‑learning and batch‑processing jobs.

Usage Example

# Install Fluid CRDs
kubectl apply -f https://github.com/fluid-cloudnative/fluid/releases/download/v0.9.0/crds.yaml

# Create a Dataset that points to an S3 bucket
cat > dataset.yaml <<EOF
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: s3-data
spec:
  mounts:
  - mountPoint: "s3://my-bucket"
    name: s3
EOF
kubectl apply -f dataset.yaml

# Deploy a workload that consumes the dataset via PVC
cat > job.yaml <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: spark-job
spec:
  template:
    spec:
      containers:
      - name: spark
        image: spark:latest
        volumeMounts:
        - name: data
          mountPath: /data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: s3-data-pvc
EOF
kubectl apply -f job.yaml

The above steps illustrate how a Dataset is defined, how Fluid provisions a cache, and how an application accesses the data through a standard PVC.

Adoption and Outlook

Since its open‑source release in September 2020, Fluid has been adopted by large enterprises such as Weibo, Qihoo 360, and China Telecom. The core maintainers are from Nanjing University, Alibaba Cloud, and the Alluxio community, with contributions from engineers at several Chinese tech firms. Future development aims to enhance flexibility, intelligence, and extensibility of the architecture, further integrating academic research with industrial practice to support a broader range of big‑data and AI workloads on native Kubernetes.

Related Links Alluxio: https://www.alluxio.io/ JindoFS: https://github.com/aliyun/alibabacloud-jindofs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Big Data AI Kubernetes distributed-caching Data Orchestration Fluid

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.