Cloud Native 15 min read

Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

This article presents the cloud‑native storage acceleration demands, evaluates what constitutes a good acceleration solution, and details the design, implementation, and real‑world practice of CloudFS—including metadata acceleration, data‑plane caching, FUSE enhancements, AI training and multi‑cloud data‑lake use cases—while outlining future roadmap plans.

DataFunTalk

Nov 5, 2023

Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

The Volcano Engine runs most machine‑learning and data‑lake workloads on a cloud‑native Kubernetes platform, where compute, storage, and middleware are separated, driving the need for storage acceleration services.

Key challenges include lack of industry‑wide standards for storage acceleration, difficulty in selecting appropriate middleware, and governance of data flow, cost models, and protocol compatibility.

A "good" storage accelerator should provide transparent acceleration, multi‑protocol compatibility, elastic scaling, and basic data‑governance capabilities.

CloudFS, evolved from an internal HDFS implementation, offers transparent acceleration, supports native HDFS mode, and integrates with object storage and NAS backends. It provides metadata acceleration by caching and deduplicating object‑store directory trees, and data‑plane caching using block replication, ARC algorithm, lazy loading, pre‑heating via P2P, and synchronous/asynchronous write‑back.

FUSE entry points have been enhanced for higher stability, including virtio replacement, high‑availability restart, page‑cache utilization, and synchronous close support.

Practical deployments include AML platform training acceleration—where CloudFS supplies accelerated storage for GPU nodes with or without local disks—and multi‑cloud data‑lake acceleration, enabling private‑cloud caching of public‑cloud object buckets.

Performance tests show significant I/O throughput gains with caching and page‑cache enabled, and comparisons with Goofys demonstrate up to double performance after cache warm‑up.

Future plans focus on further NAS backend refinement, finer‑grained cache optimizations, and elastic scaling mechanisms for cache resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native AI Kubernetes big-data storage acceleration CloudFS

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.