Industry Insights 5 min read

How RapidFS Boosts AI Model Training with 10 TiB/s Throughput

The article explains how large‑scale AI model training and inference require massive data handling, describes the RapidFS storage acceleration cluster deployed on a 30,000‑card Kunlun chip system with hundreds of domestic CPU servers, and presents performance tests showing linear throughput scaling up to over 1 TiB/s, demonstrating the impact of high‑performance storage on compute efficiency.

Baidu Geek Talk

May 14, 2025

How RapidFS Boosts AI Model Training with 10 TiB/s Throughput

Introduction

Training and inference of large AI models involve processing massive datasets, often ranging from dozens of GiB to several PiB. High‑performance compute clusters need not only powerful AI accelerators and RDMA networks but also equally fast storage systems to minimize idle compute time.

RapidFS Storage Acceleration Cluster

RapidFS is a near‑compute storage acceleration tool that builds on Baidu Object Storage (BOS) as a data‑lake foundation, providing a decoupled capacity‑performance architecture with hot‑cold tiering and transparent data flow. It offers POSIX mounting and HDFS protocol interfaces to accelerate AI training, inference, massive data processing, and distribution workloads.

Deployment Overview

At the Create 2025 conference, Baidu unveiled a 30,000‑card Kunlun AI accelerator cluster. To support this, RapidFS was deployed on hundreds of domestically produced CPU servers, forming a storage acceleration service with an aggregate throughput close to 10 TiB/s, meeting the massive read/write demands of the Kunlun cluster.

Performance Test Setup

Two RapidFS clusters were evaluated:

Cluster A: 20 RapidFS storage nodes

Cluster B: 70 RapidFS storage nodes

Each test used 160 files of 4.3 GiB (total 688 GiB) generated from the DeepSeek V3 model. Files were uploaded to BOS and loaded into RapidFS. On each compute node, eight processes continuously read the model files from RapidFS for 600 seconds.

Test Results

Cluster A (20 nodes) delivered a stable 302 GiB/s throughput, while Cluster B (70 nodes) achieved 1.03 TiB/s. A single RapidFS node provides 15 GiB/s, equivalent to 300 MiB/s per TiB of raw capacity. The throughput scales linearly with node count, and with 70 nodes, 100 compute nodes can load a 10 GiB file in just one second, enabling on‑demand data access.

Additional charts illustrate the performance of the two clusters:

Conclusion

The RapidFS storage acceleration cluster demonstrates that high‑performance, scalable storage can closely match the demands of large‑scale AI compute, acting as an efficiency booster for both training and inference workloads. The linear scaling and low latency achieved underscore the breakthrough potential of domestic compute infrastructure when paired with optimized storage solutions.

high performance computing performance testing large models AI training RapidFS storage acceleration

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.