Why Intel’s DAOS Is Redefining High‑Performance Storage for AI and HPC
As data volumes explode, Intel’s open‑source Distributed Asynchronous Object Storage (DAOS) offers a low‑latency, high‑bandwidth, NVM‑optimized platform that bridges the gap between high‑performance computing, big data analytics, and artificial intelligence workloads.
Background and Emerging Storage Challenges
Exponential data growth has turned distributed storage systems into both the core and the primary bottleneck of modern data centers. Traditional storage, designed for rotating media and POSIX I/O, suffers from high access latency, poor scalability, difficult management of massive datasets, and lack of query capabilities, making it unsuitable for new data models and next‑generation workflows.
Convergence of HPC, Big Data, and AI
Workloads now generate massive random reads and writes, with AI demanding far higher read throughput than traditional HPC. The shift from checkpoint‑heavy I/O to complex, high‑frequency patterns requires storage that can deliver data‑access speeds comparable to write bandwidth, enabling seamless data exchange among HPC, big data, and AI pipelines.
Intel DAOS Software Stack
Intel has built an open‑source, software‑defined, horizontally scalable object store called Distributed Asynchronous Object Storage (DAOS) as the foundation of its exascale storage stack. Optimized for Intel architectures and non‑volatile memory (NVM) technologies—including Intel Optane Persistent Memory and NVMe SSDs—DAOS provides high bandwidth, low latency, and high IOPS for HPC applications and supports data‑centric workflows that combine simulation, analytics, and AI.
Architecture vs. Traditional Storage
Unlike legacy stacks built for rotating media, DAOS is reconstructed for NVM, running as a lightweight user‑space system that bypasses the operating system. It abandons block‑oriented I/O models in favor of native fine‑granular data access, unlocking the performance of next‑generation storage devices.
DAOS replaces high‑latency point‑to‑point communication with low‑latency, high‑message‑rate user‑space messaging, avoiding kernel overhead. It is specifically tuned for Intel Optane Persistent Memory and NVMe SSDs, eliminating unnecessary overhead from traditional block‑device optimizations.
Metadata resides in persistent memory while bulk data is stored on NVMe SSDs; small I/O operations are absorbed in persistent memory before being migrated to flash, achieving access speeds that improve from milliseconds to microseconds.
Key Features of the DAOS Stack
Ultra‑fine granularity, low latency, zero‑copy I/O
Non‑blocking data and metadata operations for overlapping I/O and computation
Advanced data placement for fault‑domain isolation
Software‑managed redundancy with online rebuild, supporting replication and erasure coding
End‑to‑end data integrity
Scalable distributed transactions with automatic recovery
Dataset snapshot capability
Security framework for storage‑pool access control
Software‑defined storage management (provisioning, configuration, monitoring)
Native support for I/O middleware libraries (HDF5, MPI‑IO, POSIX) via the DAOS API, requiring no application code changes
Apache Spark integration
Publish/subscribe API for producer‑consumer workflows
Data indexing and query functions
In‑storage compute to reduce data movement between storage and compute nodes
Disaster‑recovery tools
Seamless integration with Lustre and other parallel file systems, providing a unified namespace across storage tiers
Data mover for migrating datasets between DAOS pools and parallel file systems
Client‑Server Model and Middleware Integration
DAOS follows a client‑server architecture; I/O operations are handled by DAOS libraries linked directly to applications and serviced by storage daemons running in user space on server nodes.
DAOS client libraries are lightweight, minimizing noise on compute nodes and supporting non‑blocking progress reporting. Leveraging libfabric and the OpenFabrics Interfaces (OFI), DAOS can exploit RDMA capabilities for efficient remote memory access.
In this new storage paradigm, POSIX is no longer the foundational data model; instead, POSIX interfaces are built atop the DAOS backend API, allowing applications to mount POSIX namespaces within containers. Metadata and data are distributed across all available storage, enhancing performance and resilience.
Supported middleware includes:
POSIX FS : Two operation modes—high concurrency “well‑behaved” workloads and stricter consistency workloads with some performance trade‑offs.
MPI‑IO : ROMIO driver integrated on top of DAOS, upstreamed to the MPICH repository and portable to other MPI implementations.
HDF5 : VOL connector enables HDF5 applications to use DAOS containers with minimal or no code changes, providing asynchronous I/O, snapshots, and query/index capabilities.
Ecosystem and Future Directions
Other HPC I/O middleware such as Silo, MDHIM, and Dataspaces can benefit from native DAOS ports. Intel collaborates with enterprises and research institutions (e.g., weather forecasting agencies, media, cloud services, oil & gas) to support new data models.
Intel is also exploring DAOS integration with big‑data frameworks, notably providing a DAOS backend for Apache Arrow to enable zero‑copy data exchange across systems like Apache Spark, Thrift, and Avro, and to simplify storage of columnar data for analytics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
