Cloud Computing 17 min read

How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage

Baidu Intelligent Cloud’s 2023 GTC presentation details how its DPU‑based IaaS architecture unifies high‑performance compute, networking, storage, and security, addressing rapid AI workload growth, reducing CPU bottlenecks, and delivering elastic, cost‑effective solutions across virtual machines, bare‑metal servers, and specialized RDMA instances.

Baidu Intelligent Cloud Tech Hub

May 19, 2023

How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage

The talk, sourced from the March 2023 NVIDIA GTC conference, explained how Baidu Intelligent Cloud leverages Data Processing Units (DPUs) to build a unified, high‑performance IaaS compute architecture that supports general compute, AI, and HPC services.

Rapid growth in AI training workloads—doubling roughly every 3.5 months—creates a gap between CPU compute growth and network bandwidth expansion, pressuring cloud providers to deliver higher I/O performance. Baidu’s cloud servers now offer up to 200 Gbps network bandwidth to meet these demands.

Challenges include the widening gap between compute and network bandwidth, the lack of hardware acceleration in standard NICs, and the overhead of virtualized I/O that consumes significant CPU resources.

Increasing product variety and configuration complexity raise management costs and affect SLA stability, while diverse customer scenarios demand finer‑grained resource slicing, security isolation, and near‑local‑disk storage performance.

Complex container‑based workloads, high‑security requirements, and GPU‑intensive AI workloads further increase the difficulty of underlying product development.

Introducing DPUs addresses these issues by offloading data‑processing tasks from CPUs, flattening hardware‑software differences, and providing flexible performance to meet varied customer scenarios.

DPUs can handle compute, network, and storage virtualization, as well as security and management functions, freeing CPU cycles for actual user workloads, improving overall resource utilization, and reducing development and operational costs.

Specialized DPU hardware also accelerates encryption/decryption, allowing data security processing to occur before data leaves the host, thereby simplifying transmission and speeding up cryptographic operations.

The "Baidu Taihang·Compute" brand represents Baidu’s DPU‑centric computing stack, integrating custom server designs, core compute components, and proprietary engines such as:

vQPE engine for resource utilization and device management.

BDMA engine for high‑performance, high‑availability I/O.

BOE engine that offloads flow‑table matching to FPGA.

BDR engine that moves the high‑performance network stack to FPGA, enabling ultra‑high‑bandwidth, low‑latency RDMA connections.

On bare‑metal servers, DPUs enable network and storage virtualization, providing virtual NICs, cloud disks, hot‑upgrade, and hot‑plug capabilities, achieving performance parity with virtual machines while retaining bare‑metal efficiency.

Instance creation time for DPU‑enabled bare‑metal servers is reduced by 80%, achieving minute‑level provisioning, and overall availability improves with a >50% reduction in downtime.

For virtual machine (BCC) instances, moving control‑plane agents and data‑plane components to DPUs frees all CPU cores for user workloads, narrowing the performance gap between VMs and bare‑metal servers and delivering 10‑20% overall resource efficiency gains.

DPUs also enable tenant‑level storage and network bandwidth management, as well as compute‑ and security‑enhancement features.

Multiple product families now leverage DPUs:

Elastic RDMA Interface (ERI) instances with virtual NICs derived from DPU VF devices, offering configurable elastic and RDMA networking.

Dedicated RDMA NIC solutions combining DPU and Mellanox Connect‑X, delivering 100 Gbps‑level VPC and high‑performance RDMA, supporting both RoCEv2 and InfiniBand for AI training and HPC workloads.

AI‑focused instances provide up to 800 Gbps RDMA bandwidth and 200 Gbps VPC bandwidth, supporting large‑scale model training and high‑performance computing.

Future plans include deeper collaboration with NVIDIA to enhance DPU and Connect‑X capabilities, further optimizing network throughput, programmable compute, and storage encryption offload, while advancing zero‑trust security features such as TLS and IPSec within DPUs.