NVIDIA Grace CPU Superchip: Architecture, Performance, and Key Features
The article provides a detailed overview of NVIDIA's Grace CPU Superchip, describing its Arm‑based architecture, NVLink‑C2C interconnect, scalable coherency fabric, high‑bandwidth LPDDR5X memory, extensive I/O options, and software ecosystem, highlighting its suitability for HPC and AI workloads.
NVIDIA Grace CPU is NVIDIA's first data‑center CPU, built from the ground up by combining NVIDIA expertise with Arm processors, system‑on‑chip design, and high‑bandwidth, low‑power memory technologies.
The Grace CPU Superchip integrates two CPUs linked by NVLink‑C2C, offering 900 GB/s bidirectional bandwidth for communication between CPUs or with an NVIDIA Hopper GPU, forming a true "superchip" for HPC and massive AI workloads.
NVLink‑C2C provides a high‑bandwidth direct connection between the two Grace CPUs, enabling efficient data movement across 144 Arm Neoverse V2 cores and up to 1 TB/s of LPDDR5X memory bandwidth.
The Scalable Coherency Fabric (SCF) extends core count and bandwidth, delivering over 3.2 TB/s total bandwidth across CPU cores, NVLink‑C2C, memory, and system I/O, with a distributed 234 MB three‑level cache across the two chips.
Grace CPU Superchip uses up to 960 GB of server‑grade LPDDR5X memory with ECC, achieving up to 53 % higher bandwidth per watt compared to traditional DDR5, while maintaining comparable cost and offering higher density.
For I/O, the chip supports up to 128 PCIe Gen 5 lanes (eight x16 links), each providing 128 GB/s bidirectional bandwidth, and can be partitioned to support GPUs, DPUs, SmartNICs, NVMe devices, and modular BMC options.
The core architecture is based on Arm Neoverse V2, delivering leading per‑thread performance and energy efficiency, compatible with the Armv9‑A ISA and supporting legacy Armv8 binaries.
Neoverse V2 implements SIMD extensions SVE2 and NEON, and supports Large System Extensions (LSE) for low‑cost atomic operations, improving inter‑CPU communication and synchronization.
Grace CPU also incorporates Armv9 features such as cryptographic acceleration, scalable analytics extensions, virtualization, full memory encryption, and secure boot.
Software-wise, Grace CPU conforms to the Arm Server Base System Architecture (SBSA) and Server Base Boot Requirements (SBBR), allowing all major Linux distributions and NVIDIA's HPC SDK, CUDA, and NGC containers to run natively without modification.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.