Azure's FPGA-Based Network Acceleration: Design Goals, Hardware Comparisons, and Performance Evaluation
The article reviews Azure's shift from pure software and ASIC networking to an FPGA‑based data‑plane solution, outlining the design motivations, comparing ASIC, multicore NIC, CPU, and FPGA approaches, addressing common FPGA concerns, and presenting performance results that show significant latency and throughput improvements over traditional software stacks.
Background: Network interface card (NIC) speeds have outpaced CPU performance, with Azure's NIC bandwidth growing from 1 Gbps in 2009 to 50 Gbps in 2017 and modern NICs reaching 200 Gbps, creating a CPU processing bottleneck for packet handling and reducing the number of sellable CPU cores, which impacts Azure's profitability.
Azure's design goals include achieving near‑hardware throughput and latency, supporting programmable SDN data‑plane iteration, targeting 100 Gbps bandwidth, and minimizing CPU usage, leading the company to explore hardware‑assisted solutions.
Hardware options compared:
ASIC : Offers highest performance with SR‑IOV but suffers from poor programmability, long development cycles (1‑2 years), and limited flexibility.
Multicore SoC‑based NIC : Provides programmable cores at the cost of reduced performance at higher bandwidths (e.g., struggles beyond 40 Gbps).
CPU (software) : Highly flexible using DPDK and poll‑mode optimizations, but consumes valuable host CPU cycles, leading to latency jitter and higher cost.
FPGA : Balances performance and flexibility by leveraging massive parallelism and pipelining; although individual clock speeds are lower than CPUs, the parallel resources enable throughput far beyond CPU‑only solutions.
Common FPGA questions addressed:
FPGA boards are roughly 2‑3 × larger than ASIC equivalents due to additional transceivers and memory, though the logical core area can be 10‑20 × larger.
Cost is justified by silicon‑level savings from reduced CPU, flash, and DRAM usage; exact pricing is proprietary.
Programming requires Verilog and hardware expertise, but Azure maintains a dedicated hardware team and applies software engineering practices to accelerate development.
Performance data: The FPGA‑accelerated path achieves an average inter‑VM latency of 17 µs (P99 = 25 µs, P99.9 = 80 µs), substantially better than the 50‑300 µs range of the software baseline. Throughput reaches 30 Gbps with zero additional CPU consumption, compared to 8 Gbps for the CPU‑only solution.
Comparison with GCP and AWS shows Azure's FPGA solution outperforming competitors, especially after CPU‑related security patches (e.g., Meltdown/Spectre) reduced CPU performance, further highlighting the resilience of the FPGA approach.
Conclusion: Azure chose programmable hardware to meet the scaling, performance, and cost challenges of modern cloud networking; while the upfront investment is high, the long‑term benefits in latency, throughput, and CPU savings make it a viable strategy for large‑scale cloud providers.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.