Unlock Cloud‑Level RDMA Performance with Volcengine’s vRDMA
Volcengine’s vRDMA brings high‑performance, low‑latency RDMA acceleration to cloud VPCs, combining self‑developed congestion control, elastic ENI integration, and compatibility with HPC, AI, and big‑data workloads to deliver up to 320 Gbps bandwidth and microsecond‑level latency.
Volcengine recently launched the vRDMA feature on selected cloud server instance types, offering large‑scale RDMA acceleration within VPCs that is compatible with traditional HPC, AI, and TCP/IP applications, thereby lowering the adaptation barrier for many workloads.
Traditional TCP/IP transmission involves multiple protocol layers, kernel processing, data copying, and context switches, which limits performance for demanding scenarios such as high‑performance computing, model inference, machine learning, and big‑data transfer.
RDMA eliminates these bottlenecks by allowing applications to directly access remote memory without kernel or protocol‑stack involvement, achieving memory‑to‑memory transfers with microsecond‑level latency.
Volcengine previously provided RDMA on GPU‑type instances and supported various RDMA use cases.
Because RDMA requires lossless networks, high operational costs, and is typically confined to a single cluster, vRDMA was created to run RDMA inside a cloud VPC while delivering performance close to physical networks.
What is vRDMA
vRDMA is Volcengine’s self‑developed elastic RDMA network that runs over VPC elastic network interfaces (ENIs). Users only need to attach an ENI with vRDMA enabled to an instance to activate RDMA communication without additional hardware or fees.
Technical Advantages
High Performance : Low CPU load and network latency provide performance comparable to physical RDMA.
Shared VPC Network : vRDMA reuses the existing VPC network, eliminating the need for separate RDMA NICs.
Compatibility : Supports standard Verbs and most InfiniBand semantics, allowing existing applications to run with minimal changes.
Elastic Scaling : ENIs with vRDMA can be created and attached on demand, enabling flexible scaling.
Isolation : Shares instance bandwidth with TCP/IP while providing fine‑grained traffic isolation via vQoS and multi‑level meters.
Large‑Scale Networking : Works over lossy Ethernet using a custom congestion‑control (CC) algorithm, supporting high‑performance communication across clusters and long distances.
Performance
Thanks to the custom CC algorithm and high‑performance vSwitch, Volcengine ECS instances with vRDMA achieve up to 320 Gbps bandwidth, as low as 5 µs latency, and up to 50 M messages per second.
Average latency is reduced to one‑fifth of kernel TCP/IP (80% lower), and tail latency drops by 99%. Single‑connection throughput can be up to 300% higher than kernel TCP/IP.
Best Practices
Distributed Storage : vRDMA improves read bandwidth by 35% in typical scenarios and up to 60% in full‑hit cases.
Large‑Model Inference : In LLM inference, vRDMA cuts second‑token and tail latency by about 50% compared to kernel TCP/IP.
High‑Performance Computing : For HPC workloads such as LS‑DYNA and Star‑CCM, vRDMA delivers roughly 30% higher linear scaling and supports larger node counts.
Conclusion
Research on Volcengine vRDMA and its custom congestion‑control algorithm has been accepted as a full paper at ACM SIGCOMM 2025 and at the CCF A‑class conference ATC 2025, underscoring its strong system contributions.
Future plans include expanding vRDMA to more instance types, enriching the ecosystem with custom transport optimizations and high‑performance collective communication libraries, and evolving RDMA on VPC from niche AI/HPC use cases to general‑purpose cloud computing.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
