Cloud Native 6 min read

How Fluid Accelerates Cloud‑Native Deep Learning Training

Fluid, an open‑source CNCF project co‑developed by Alibaba Cloud and Nanjing University, introduces a dataset abstraction and elastic caching architecture that automatically optimizes I/O for cloud‑native deep‑learning training jobs, and its research was accepted as a full paper at the prestigious ICDE 2022 conference.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Fluid Accelerates Cloud‑Native Deep Learning Training

ICDE 2022 Acceptance

The International Conference on Data Engineering (ICDE) is an IEEE flagship conference, ranked alongside SIGMOD and VLDB as one of the three top venues in data management and databases. A paper titled Fluid: Dataset Abstraction and Elastic Acceleration for Cloud‑native Deep Learning Training Jobs was accepted as a full long paper at ICDE 2022.

Problem Statement

Running deep‑learning training workloads on cloud‑native platforms (Kubernetes/Docker) brings high elasticity and low‑cost operation, but it also creates severe I/O bottlenecks: complex data access patterns, difficulty matching GPU I/O demand, and inefficient sharing of cached data across jobs.

Proposed Solution – Fluid

Fluid provides a Fluid Dataset abstraction that hides heterogeneous storage back‑ends and introduces an automatically optimized cache engine that adapts to dataset characteristics. The system can elastically scale cache space during training based on real‑time I/O demand, and it can reorder job scheduling using cross‑job cache semantics to improve overall throughput.

Open‑Source Project Details

Fluid is an open‑source project under the Cloud Native Computing Foundation (CNCF) and is hosted at https://github.com/fluid-cloudnative/fluid. Initiated jointly by Alibaba Cloud’s cloud‑native team and the Computer Science Department of Nanjing University, the project has accumulated over 1,000 pull‑request submissions, released seven versions, and was accepted into CNCF in April 2021, filling a gap in elastic data‑caching orchestration within the Kubernetes ecosystem.

Real‑World Impact

In production, Fluid has helped many users significantly improve AI model training performance while reducing the complexity of managing training data. Alibaba Cloud integrates Fluid’s core ideas into its cloud‑native AI suite delivered via the ACK (Alibaba Cloud Kubernetes) service.

Recognition and Broader Innovation

The paper’s acceptance reflects Alibaba Cloud’s ongoing innovations in container‑based AI workloads, including prior work on serverless image distribution that was accepted at USENIX ATC 2021. In early 2022, Forrester’s Wave report placed Alibaba Cloud in the “Leader” quadrant for public‑cloud container platforms, a first for a Chinese vendor.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeDeep Learningopen sourceICDEData Acceleration
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.