Tagged articles
16 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Nov 3, 2022 · Cloud Native

Challenges and Solutions for AI Storage Systems in Cloud‑Native Training

The talk outlines how AI training’s growing data and compute demands create storage bottlenecks across four evolutionary stages, identifies four core problems—massive data, data‑flow, resource scheduling, and compute acceleration—and proposes hardware, software (parallel file systems, caching), and cloud‑native orchestration (Fluid, Baidu Canghai) solutions that combine object‑storage lakes with high‑performance acceleration layers to achieve near‑full GPU utilization.

AICloud NativeData Lake
0 likes · 37 min read
Challenges and Solutions for AI Storage Systems in Cloud‑Native Training
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Oct 19, 2022 · Artificial Intelligence

Why Storage Systems Bottleneck AI Training and How to Accelerate Them

This article examines the comprehensive challenges AI applications face from storage to compute, traces the evolution of AI training infrastructure, analyzes key bottlenecks such as compute acceleration, resource scheduling, massive data handling and data flow, and presents Baidu Cloud's storage acceleration solutions—including parallel file systems, caching, and the Fluid scheduler—to dramatically improve AI training performance.

AI trainingCloud NativeData Lake
0 likes · 38 min read
Why Storage Systems Bottleneck AI Training and How to Accelerate Them
DataFunTalk
DataFunTalk
Aug 17, 2022 · Cloud Computing

High‑Performance Computing Storage Challenges and Baidu Canghai Storage Solutions

This article explains the storage problems faced by traditional HPC, AI‑driven HPC and high‑performance data analysis, describes Baidu's internal high‑performance storage practices, and introduces the Baidu Canghai solution—including object storage BOS, parallel file system PFS, RapidFS, data‑flow mechanisms and a customer case—demonstrating how these technologies meet the demanding throughput, latency and cost requirements of modern high‑performance workloads.

AIBaiduHigh‑performance computing
0 likes · 29 min read
High‑Performance Computing Storage Challenges and Baidu Canghai Storage Solutions
Baidu Geek Talk
Baidu Geek Talk
Jul 26, 2022 · Industry Insights

How Baidu’s Canghai Storage Powers High‑Performance Computing: Challenges and Solutions

This article analyzes the storage challenges of high‑performance computing—including traditional HPC, AI‑driven HPC, and high‑performance data analysis—examines Baidu’s internal practices, and presents the Canghai storage platform with its object storage, parallel file system (PFS) and RapidFS solutions that address throughput, latency, and scalability requirements.

AI trainingHigh‑performance computingcloud storage
0 likes · 31 min read
How Baidu’s Canghai Storage Powers High‑Performance Computing: Challenges and Solutions
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 21, 2022 · Cloud Computing

How Baidu’s Cloud Storage Powers High‑Performance Computing and AI Workloads

This article explains the storage challenges of high‑performance computing—including traditional HPC, AI‑driven HPC, and HPDA—then details Baidu’s unified storage platform, object storage BOS, and runtime solutions PFS and RapidFS, illustrating their architecture, features, and a real‑world autonomous‑driving customer case.

AI trainingData Lakecloud storage
0 likes · 29 min read
How Baidu’s Cloud Storage Powers High‑Performance Computing and AI Workloads
Architects' Tech Alliance
Architects' Tech Alliance
Dec 7, 2020 · Fundamentals

Overview of Lustre Parallel File System Architecture and Performance Characteristics

The article provides a comprehensive overview of the Lustre parallel file system architecture, its core components, POSIX compliance, scalability, high‑performance networking, security features, data layout mechanisms, and performance considerations for large and small files, along with practical optimization tips for HPC environments.

HPCLustrePOSIX
0 likes · 17 min read
Overview of Lustre Parallel File System Architecture and Performance Characteristics
Architects' Tech Alliance
Architects' Tech Alliance
Dec 3, 2020 · Fundamentals

IBM GPFS (Spectrum Scale) Overview: History, Architecture, Features, and High‑Performance Computing Use Cases

This article provides a comprehensive overview of IBM's General Parallel File System (GPFS), detailing its historical development, architectural models—including SAN, NSD, and Share‑Nothing Cluster—its operational capabilities, performance advantages, scalability, high‑availability features, and its role in large‑scale high‑performance computing environments.

Distributed File SystemGPFSHigh‑performance computing
0 likes · 12 min read
IBM GPFS (Spectrum Scale) Overview: History, Architecture, Features, and High‑Performance Computing Use Cases
Open Source Linux
Open Source Linux
Dec 2, 2020 · Fundamentals

Unlocking NFS & pNFS: How Parallel File Systems Boost Performance

This article explains the fundamentals of NFS, introduces the advanced pNFS architecture with its three protocols, compares storage layouts, and discusses performance benefits and real‑world deployments in high‑performance computing environments.

High‑performance computingNFSpNFS
0 likes · 13 min read
Unlocking NFS & pNFS: How Parallel File Systems Boost Performance
Architects' Tech Alliance
Architects' Tech Alliance
Jul 6, 2019 · Fundamentals

BeeGFS and Parallel File Systems in High‑Performance Computing: Evolution, Market Trends, and Technical Overview

BeeGFS, an open‑source parallel file system originally developed by Fraunhofer, has emerged as a flexible, high‑performance alternative to GPFS and Lustre in HPC, driven by growing demands from large‑scale analytics, AI, and cloud storage, with expanding global adoption and ecosystem partnerships.

BeeGFSGPFSHPC
0 likes · 14 min read
BeeGFS and Parallel File Systems in High‑Performance Computing: Evolution, Market Trends, and Technical Overview
Architects' Tech Alliance
Architects' Tech Alliance
Aug 7, 2018 · Operations

Lustre Performance Optimization Guide

This article provides a comprehensive guide to optimizing Lustre, the leading open‑source parallel file system for high‑performance computing, covering network bandwidth, stripe settings, client configuration, RAID choices, small‑file handling, and practical system commands to improve aggregate I/O performance.

HPCLustreStorage Optimization
0 likes · 8 min read
Lustre Performance Optimization Guide
Architects' Tech Alliance
Architects' Tech Alliance
Jul 3, 2017 · Operations

BeeGFS Features, Quotas, Mirroring, APIs, and Deployment Guidelines

This article provides a comprehensive overview of BeeGFS, covering its architecture, BeeOND on‑demand instances, quota and directory‑quota mechanisms, Buddy mirroring, supported APIs, hardware requirements, network options, and export methods via SMB/CIFS and NFS for high‑performance computing environments.

BeeGFSHigh‑performance computingmirroring
0 likes · 11 min read
BeeGFS Features, Quotas, Mirroring, APIs, and Deployment Guidelines
Architects' Tech Alliance
Architects' Tech Alliance
Jun 24, 2017 · Operations

BeeGFS Parallel File System: Architecture, Components, Installation, and Tuning Guide

BeeGFS is a GPL‑licensed parallel file system for Linux that offers scalable storage through a modular architecture of management, metadata, and object storage servers, supports a wide range of hardware and OS platforms, and provides detailed installation, configuration, and performance‑tuning guidance including the BeeOND burst‑buffer extension.

BeeGFSHPCInstallation
0 likes · 15 min read
BeeGFS Parallel File System: Architecture, Components, Installation, and Tuning Guide