How DPU‑Based Architectures Revolutionize High‑Performance Storage Networks
This article examines the role of Data Processing Units (DPUs) in modern data‑center storage networking, detailing their architecture, core offload technologies, three offload modes, and the performance advantages they bring to both bare‑metal and virtualized environments while highlighting trade‑offs and implementation considerations.
Background
Increasing network bandwidth and storage performance have shifted a large portion of server resources—approximately 30%—to handling networking and storage protocols on the host CPU. At the same time, CPU performance improvements are slowing, reducing the energy‑efficiency of data‑center services.
Data Processing Unit (DPU) Overview
A DPU is a specialized processor built for data‑centric workloads. It uses a software‑defined architecture to virtualize infrastructure‑layer resources and offload networking and storage protocols, alleviating CPU bottlenecks.
Core DPU Technologies
IO hardware device virtualization
VPC overlay network acceleration
EBS distributed storage acceleration
Local storage virtualization acceleration
RDMA‑based high‑speed data transfer
Security hardware acceleration
Elastic bare‑metal support
Resource pooling capabilities
NVMe‑over‑RDMA Offload Modes
Non‑offload : All data passes through the embedded CPU, requiring multiple DMA copies between host memory, embedded CPU memory, and the NIC.
Zero‑copy : Data moves directly from host memory to remote storage without traversing the embedded CPU cache, reducing DMA hops. This mode relies on SPDK bdev and requires RDMA support on the storage side.
Full‑offload : Both control and data planes are handled entirely by hardware, eliminating embedded CPU involvement. It provides the highest throughput but limits software control over the storage backend.
High‑Performance Storage Architecture with DPU
By separating compute from storage, DPUs move network and storage protocol processing from the host CPU to the DPU, decreasing CPU utilization and increasing throughput. Integrated accelerators (encryption, compression, etc.) further speed up data handling. Hardware‑level storage virtualization allows a single physical device to appear as multiple virtual devices, reducing data copies and latency in virtualized environments.
Application Scenarios
(1) Bare‑Metal
In bare‑metal deployments, users have exclusive access to physical servers, running operating systems and applications directly on the hardware. This eliminates virtualization overhead, delivering higher performance, lower latency, and better isolation—suitable for large databases and high‑performance computing workloads.
(2) Virtualized
In virtualized environments, cloud providers split physical machines into multiple VMs to improve hardware utilization and reduce data‑center costs. However, VM access to network storage suffers from memory copies, virtualization overhead, and network device limits. DPUs can virtualize remote storage as local NVMe devices using SR‑IOV, dramatically reducing latency and achieving near‑native performance.
Performance Considerations
Traditional data paths involve multiple context switches and data copies between user space, kernel space, and network stacks, consuming CPU cycles and increasing latency. Offload modes reduce these overheads: Zero‑copy eliminates the embedded CPU cache hop, while Full‑offload removes software control entirely, offering the greatest throughput at the cost of flexibility.
Conclusion
DPUs provide a powerful mechanism to offload networking and storage workloads, enabling higher performance, lower CPU utilization, and more efficient data‑center resource usage. Selecting the appropriate offload mode requires balancing performance, control, and compatibility with existing storage solutions.
Related Reading
DPU Hardware Standardization Exploration – https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650749086&idx=1&sn=45210c6e19dae3ed31e2534890f99ee4
Advances and Future Innovations in DPU Technology – https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650749080&idx=1&sn=6a0b1f2bbbdbc470b4155abb253f1feb
Practical DPU Deployments and Use Cases – https://mp.weixin.qq.com/s?__biz=MzAxNzU3NjcxOA==∣=2650746845&idx=1&sn=0813ac94a58503d068bc270a9b12d753
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
