Industry Insights 13 min read

Why Intel’s DAOS Is Redefining High‑Performance Storage for AI and HPC

As data volumes explode, Intel’s open‑source Distributed Asynchronous Object Storage (DAOS) offers a low‑latency, high‑bandwidth, NVM‑optimized platform that bridges the gap between high‑performance computing, big data analytics, and artificial intelligence workloads.

Architects' Tech Alliance

Apr 14, 2021

Why Intel’s DAOS Is Redefining High‑Performance Storage for AI and HPC

Background and Emerging Storage Challenges

Exponential data growth has turned distributed storage systems into both the core and the primary bottleneck of modern data centers. Traditional storage, designed for rotating media and POSIX I/O, suffers from high access latency, poor scalability, difficult management of massive datasets, and lack of query capabilities, making it unsuitable for new data models and next‑generation workflows.

Convergence of HPC, Big Data, and AI

Workloads now generate massive random reads and writes, with AI demanding far higher read throughput than traditional HPC. The shift from checkpoint‑heavy I/O to complex, high‑frequency patterns requires storage that can deliver data‑access speeds comparable to write bandwidth, enabling seamless data exchange among HPC, big data, and AI pipelines.

Intel DAOS Software Stack

Intel has built an open‑source, software‑defined, horizontally scalable object store called Distributed Asynchronous Object Storage (DAOS) as the foundation of its exascale storage stack. Optimized for Intel architectures and non‑volatile memory (NVM) technologies—including Intel Optane Persistent Memory and NVMe SSDs—DAOS provides high bandwidth, low latency, and high IOPS for HPC applications and supports data‑centric workflows that combine simulation, analytics, and AI.

Architecture vs. Traditional Storage

Unlike legacy stacks built for rotating media, DAOS is reconstructed for NVM, running as a lightweight user‑space system that bypasses the operating system. It abandons block‑oriented I/O models in favor of native fine‑granular data access, unlocking the performance of next‑generation storage devices.

DAOS replaces high‑latency point‑to‑point communication with low‑latency, high‑message‑rate user‑space messaging, avoiding kernel overhead. It is specifically tuned for Intel Optane Persistent Memory and NVMe SSDs, eliminating unnecessary overhead from traditional block‑device optimizations.

Metadata resides in persistent memory while bulk data is stored on NVMe SSDs; small I/O operations are absorbed in persistent memory before being migrated to flash, achieving access speeds that improve from milliseconds to microseconds.

Key Features of the DAOS Stack

Ultra‑fine granularity, low latency, zero‑copy I/O

Non‑blocking data and metadata operations for overlapping I/O and computation

Advanced data placement for fault‑domain isolation

Software‑managed redundancy with online rebuild, supporting replication and erasure coding

End‑to‑end data integrity

Scalable distributed transactions with automatic recovery

Dataset snapshot capability

Security framework for storage‑pool access control

Software‑defined storage management (provisioning, configuration, monitoring)

Native support for I/O middleware libraries (HDF5, MPI‑IO, POSIX) via the DAOS API, requiring no application code changes

Apache Spark integration

Publish/subscribe API for producer‑consumer workflows

Data indexing and query functions

In‑storage compute to reduce data movement between storage and compute nodes

Disaster‑recovery tools

Seamless integration with Lustre and other parallel file systems, providing a unified namespace across storage tiers

Data mover for migrating datasets between DAOS pools and parallel file systems

Client‑Server Model and Middleware Integration

DAOS follows a client‑server architecture; I/O operations are handled by DAOS libraries linked directly to applications and serviced by storage daemons running in user space on server nodes.

DAOS client libraries are lightweight, minimizing noise on compute nodes and supporting non‑blocking progress reporting. Leveraging libfabric and the OpenFabrics Interfaces (OFI), DAOS can exploit RDMA capabilities for efficient remote memory access.

In this new storage paradigm, POSIX is no longer the foundational data model; instead, POSIX interfaces are built atop the DAOS backend API, allowing applications to mount POSIX namespaces within containers. Metadata and data are distributed across all available storage, enhancing performance and resilience.

Supported middleware includes:

POSIX FS : Two operation modes—high concurrency “well‑behaved” workloads and stricter consistency workloads with some performance trade‑offs.

MPI‑IO : ROMIO driver integrated on top of DAOS, upstreamed to the MPICH repository and portable to other MPI implementations.

HDF5 : VOL connector enables HDF5 applications to use DAOS containers with minimal or no code changes, providing asynchronous I/O, snapshots, and query/index capabilities.

Ecosystem and Future Directions

Other HPC I/O middleware such as Silo, MDHIM, and Dataspaces can benefit from native DAOS ports. Intel collaborates with enterprises and research institutions (e.g., weather forecasting agencies, media, cloud services, oil & gas) to support new data models.

Intel is also exploring DAOS integration with big‑data frameworks, notably providing a DAOS backend for Apache Arrow to enable zero‑copy data exchange across systems like Apache Spark, Thrift, and Avro, and to simplify storage of columnar data for analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI High-performance computing Distributed storage object-storage NVM Intel DAOS

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.