Industry Insights 15 min read

How IBM Spectrum Scale 5.0 Boosts HPC Small‑File I/O for AI/ML Workloads

The article analyzes IBM Spectrum Scale 5.0's new features—RDMA support, lock‑free reads, multi‑layer NVMe write cache, and variable SubBlock sizes—and demonstrates how they enable a next‑generation HPC supercomputer to meet demanding AI/ML/DL small‑file and large‑file I/O performance targets, with benchmark results showing up to 700% improvement.

Architects' Tech Alliance

Mar 21, 2021

How IBM Spectrum Scale 5.0 Boosts HPC Small‑File I/O for AI/ML Workloads

Spectrum Scale 5.0 Performance Enhancements

IBM Spectrum Scale 5.0 introduces several optimizations aimed at AI/ML/DL workloads, which are heavily dependent on small‑file I/O. The enhancements include:

Improved Remote Direct Memory Access (RDMA) support to accelerate inter‑node communication with lower overhead.

A new lock‑free read path that increases parallelism and reduces serialization latency.

A multi‑layer write cache that leverages NVMe storage to speed up small‑file block writes.

Variable SubBlock sizes that adapt the block size to the file size, improving space efficiency and I/O throughput for both small and large files.

IO Requirements of AI/ML/DL Workloads

AI, machine learning, and deep learning pipelines consist of five stages: data ingestion, data cleaning & transformation, neural‑network exploration & architecture design, training, and inference. While the exploration stage generates little I/O, the other stages involve massive data movement, often with large sequential reads/writes for raw datasets and intensive random reads/writes of many small files during cleaning, training, and inference.

New HPC Lab Storage Requirements

A next‑generation HPC laboratory plans to deploy a world‑class supercomputer with roughly 4,600 hybrid CPU‑GPU nodes (2 × IBM Power9 CPUs and 6 × NVIDIA V100 GPUs per node), each node providing 0.5 TB DRAM, 1.6 TB local storage, and connected via dual‑rail EDR InfiniBand (≈23 GB/s per node). The storage subsystem must satisfy:

≥ 50,000 file creations per second.

≥ 1 TB/s aggregate sequential read/write bandwidth for 1 MB transfers.

Peak aggregate sequential bandwidth of 2.5 TB/s.

≥ 2.6 million 32 KB small‑file operations per second.

IBM ESS Configuration with Spectrum Scale 5.0

The IBM Elastic Storage Server (ESS) cluster consists of 77 nodes running Spectrum Scale 5.0. Each ESS node uses a dual‑socket IBM Power9 server with 1 TB memory, four 4U/106 drive enclosures (104 disks + 2 NVMe SSDs per enclosure), delivering up to 4 PB raw capacity per node. Nodes are linked by a 4× EDR InfiniBand fabric offering up to 90 GB/s network bandwidth.

Benchmark Results

Performance tests were conducted to verify that the ESS + Spectrum Scale 5.0 solution meets the required metrics.

Small‑file creation: 57,000 files/s (target ≥ 50,000 files/s) using 1 KB files across 23 client nodes.

Sequential 1 MB read/write bandwidth: Individual node write performance of 23 GB/s (IOR) and 16 GB/s (TQOSPERF); aggregated across 77 nodes, total write bandwidth exceeds 1 TB/s.

Peak sequential bandwidth: With ~49 TB of data and 16 MB transfer size, peak write = 36.2 GB/s, peak read = 43.4 GB/s per node; scaling to 77 nodes comfortably surpasses the 2.5 TB/s target.

Small‑file random I/O: Single‑thread random read/write latency of 80 µs and 200 µs respectively; multi‑threaded 4 KB transfers achieve ~2.7 M operations/s with 16 threads—a 700 % improvement over the previous 4.2.2 release.

Remote procedure calls: 12‑node configuration reaches >5 M RPCs/s.

Conclusion

The IBM ESS combined with Spectrum Scale 5.0 satisfies or exceeds all specified I/O targets for the new HPC supercomputer, delivering multi‑TB/s large‑file bandwidth, >1 TB/s small‑file bandwidth, and millions of file creations per second. The introduced enhancements—especially the multi‑layer NVMe write cache and variable SubBlock sizing—yield up to a 700 % boost in small‑file read performance, positioning the solution as a world‑class option for modern, multi‑workload HPC and scientific computing environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI ML NVMe HPC storage performance IBM Spectrum Scale DL Small File I/O

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.