Operations 6 min read

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

The article explains how RapidFS, a near‑compute storage acceleration solution built on BOS object storage, delivers up to 10 TiB/s throughput for massive AI model training, detailing its architecture, deployment on a 30,000‑card Kunlun cluster, and performance test results that show linear scaling from 20 to 70 nodes.

Baidu Intelligent Cloud Tech Hub

Apr 25, 2025

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

1. Introduction

Large‑model training and inference are fundamentally massive data‑processing tasks. High‑performance AI accelerator cards and RDMA networks alone are insufficient; they also require a high‑performance storage system. Data sizes range from tens of GiB to hundreds of TiB, even up to several PiB. Faster storage reduces compute idle time, demanding a storage‑acceleration system that can support large‑scale compute clusters and massive data workloads.

2. RapidFS Storage Acceleration Cluster

At the Create 2025 conference, the Kunlun Chip 30 k‑card cluster was officially announced. To meet its massive data‑read/write requirements, hundreds of domestic‑CPU servers were deployed for the RapidFS storage acceleration service, achieving a total throughput close to 10 TiB/s.

Performance tests show that 20 RapidFS storage nodes provide a stable 302 GiB/s throughput, while 70 nodes deliver 1.03 TiB/s. A single RapidFS node can sustain 15 GiB/s, equivalent to 300 MiB/s per TiB of raw capacity. Throughput scales linearly with cluster size, and with 70 nodes, 100 compute nodes can load a 10 GiB file in just one second, enabling data‑on‑demand access.

3. RapidFS Product Overview

RapidFS is a near‑compute storage acceleration tool that leverages BOS object storage as a data‑lake foundation, providing a decoupled capacity‑performance architecture with hot/cold tiering and transparent data flow. It offers POSIX mounting and HDFS protocol interfaces, delivering unified file access for upstream compute workloads and accelerating AI training, inference, massive data processing, and data distribution scenarios.

4. Performance Test Details

4.1 Server Configuration

In this test, Baidu Cloud RapidFS was deployed as a fully managed cluster on domestic‑CPU servers within the 30 k‑card Kunlun cluster, serving as a near‑compute storage acceleration service. Detailed configurations are shown below.

4.2 Test Scale

Two RapidFS clusters were tested: one with 20 storage nodes and another with 70 storage nodes.

4.3 Test Method

Using DeepSeek V3 model files, 160 files of 4.3 GiB each (total 688 GiB) were stored in BOS and loaded into the RapidFS cluster. Each compute node launched eight processes to read model files from RapidFS continuously for 600 seconds.

4.4 Test Results

Test cluster A (20 RapidFS nodes) achieved 302 GiB/s; test cluster B (70 RapidFS nodes) achieved 1.03 TiB/s. The results demonstrate that RapidFS storage acceleration clusters scale linearly with node count, providing high‑throughput data access that resonates with compute demand, becoming an efficiency booster for large‑model training and inference.

—‑‑‑‑‑‑‑‑‑‑ END ‑‑‑‑‑‑‑‑‑‑‑

high performance computing Performance Testing AI training RapidFS storage acceleration