Cloud Native 19 min read

How Fluid Turns Kubernetes into a High‑Performance Data Logistics System

This article explains how the open‑source Fluid project addresses the inefficiencies of data‑intensive AI and big‑data workloads in cloud‑native Kubernetes environments by introducing a data‑centric abstraction, dual orchestration mechanisms, and seamless integration with Alluxio to achieve faster, secure, and scalable data access.

Alibaba Cloud Native

Apr 2, 2021

How Fluid Turns Kubernetes into a High‑Performance Data Logistics System

Background

Cloud platforms provide low‑cost, scalable resources for data‑intensive AI and big‑data workloads, but the native design of cloud‑native environments (e.g., Kubernetes) separates compute from storage. This separation introduces high data‑access latency, makes hybrid‑cloud multi‑storage analysis costly, and complicates security and multi‑dimensional management.

Fluid Overview

Core Concepts

Dataset : a logical collection of related data expressed as a custom Kubernetes CRD. It abstracts the underlying storage locations and presents a unified interface.

Runtime : the execution engine that provides caching, versioning, and security for a Dataset. The current implementation uses Alluxio.

AlluxioRuntime : a specific Runtime implementation based on the Alluxio distributed cache.

Dual Orchestration

Dataset orchestration : manages the lifecycle of Datasets and schedules the cache engine (scale‑out, scale‑in, placement) across cluster nodes.

Application orchestration : schedules pods onto nodes that already host the required cached data, achieving data‑locality for the workload.

Architecture

Dataset Controller : creates Datasets and binds them to a Runtime.

Runtime Controller : decides the number and placement of cache replicas.

Volume Controller : bridges Fluid with Kubernetes PVC/PV mechanisms.

Fluid‑Scheduler with two plugins:

Cache co‑locality Plugin – places pods on nodes where the data is cached.

Prefetch Plugin – proactively loads data into the cache before pod scheduling.

Using Fluid

Users create a Dataset CRD that specifies source locations (e.g., Alibaba Cloud OSS, Ceph). Fluid automatically creates a corresponding PersistentVolumeClaim (PVC). Pods mount the PVC without needing to know the underlying storage, enabling transparent data access and seamless migration.

Observability and Metrics

Fluid exposes metrics in the Dataset status, such as total cache capacity and current usage. Example values: capacity = 200 GB, usage = 84.29 GB. Operators can monitor these metrics to decide when to scale cache resources.

Performance Evaluation

Benchmarks on GPU‑accelerated training show that Fluid’s caching reduces data‑access bottlenecks. As the number of GPUs increases, Fluid delivers up to a 2× end‑to‑end speedup compared with direct Cloud Storage access, lowering both training time and cost.

Repository

Source code, demos, and documentation are available at:

https://github.com/fluid-cloudnative/fluid

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Big Data Kubernetes Data Management Alluxio Fluid

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.