Tag

AI Storage

1 views collected around this technical thread.

AntData
AntData
Mar 7, 2025 · Artificial Intelligence

Design and Implementation of a Cloud‑Native AI Storage Acceleration System (PCache) for Large‑Scale Model Training

This article examines the challenges of AI storage for massive models, describes Ant Group's multi‑cloud, high‑availability PCache architecture, and details its GPU‑mixed deployment, metadata services, data‑link optimizations, and performance results that enable petabyte‑scale training with low cost and high stability.

AI StoragePCachePerformance Optimization
0 likes · 19 min read
Design and Implementation of a Cloud‑Native AI Storage Acceleration System (PCache) for Large‑Scale Model Training
AntData
AntData
Mar 4, 2025 · Big Data

Design and Analysis of 3FS: An AI‑Optimized Distributed File System

The article provides a comprehensive English overview of 3FS, an AI‑focused distributed file system that leverages FoundationDB for metadata, CRAQ for chunk replication, and a hybrid Fuse/native client architecture, detailing its design, components, fault handling, and performance considerations for large‑scale training workloads.

AI StorageCRAQ replicationFoundationDB
0 likes · 25 min read
Design and Analysis of 3FS: An AI‑Optimized Distributed File System
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 2, 2024 · Cloud Computing

Alibaba Cloud Showcases ALS System and AI‑Driven Storage Innovations at CCF China Storage Conference 2024

At the CCF China Storage Conference 2024 in Guangzhou, Alibaba Cloud’s research team presented the ALS (ALink System) ecosystem, discussed Scale‑Up interconnect protocols, and demonstrated multi‑layer storage innovations for AI workloads, highlighting hardware‑software integration, high‑bandwidth networking, and future CXL/PIM research.

AI StorageALSAlibaba Cloud
0 likes · 7 min read
Alibaba Cloud Showcases ALS System and AI‑Driven Storage Innovations at CCF China Storage Conference 2024
DataFunTalk
DataFunTalk
May 14, 2024 · Cloud Computing

Hybrid Cloud Architecture and AI Storage Evolution at Zhihu: From UnionStore to Alluxio

This article describes Zhihu's hybrid cloud architecture—including offline, online, and GPU data centers—its self‑built UnionStore cache, the performance and latency challenges faced during large‑scale AI model training, and the subsequent evaluation and migration to Alluxio community and enterprise editions to achieve higher throughput, stability, and lower operational overhead.

AI StorageAlluxioUnionStore
0 likes · 14 min read
Hybrid Cloud Architecture and AI Storage Evolution at Zhihu: From UnionStore to Alluxio