Artificial Intelligence 13 min read

Venado Supercomputer: Architecture, Performance, and Design Insights

The Venado supercomputer, built for Los Alamos National Laboratory, combines Nvidia Grace CPUs with Hopper GPUs, leverages high‑bandwidth memory and Slingshot interconnects, and targets a balanced 80/20 CPU‑GPU workload split to support demanding AI and HPC applications.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Venado Supercomputer: Architecture, Performance, and Design Insights

Today marks the ribbon‑cutting ceremony of the "Venado" supercomputer, a project hinted at when Nvidia announced its first data‑center‑class Arm server CPU in April 2021. The system’s name comes from a mountain peak in New Mexico’s Sangre de Cristo range.

Los Alamos, founded in 1943, has a long history of complex computations, from early punched‑card machines to a succession of supercomputers from IBM, Cray, and others. Its recent installations include the 2015 "Trinity" system (2 PB memory, Intel Xeon CPUs, 100 Gb/s Omni‑Path) and the 2023 "Crossroads" system (Intel Sapphire Rapids Xeon SP CPUs, HBM2e memory, HPE Slingshot interconnect).

Seeking to push Arm‑based server clusters, Los Alamos influenced Intel to create a Sapphire Rapids variant with HBM and persuaded Nvidia to develop the Grace CG100 Arm server chip, pairing it with Hopper and future Blackwell GPU accelerators.

The Venado design emphasizes a better balance between memory bandwidth and compute, opting for HPE’s Slingshot 11 (200 Gb/s) over Nvidia’s Quantum‑2 InfiniBand due to cost considerations.

Venado is an experimental machine, not the primary system in the Los Alamos fleet, built on a limited budget for hardware and software research. Its architecture splits compute cycles roughly 80 % GPU (FP64) and 20 % CPU, resulting in a heavier CPU load than typical GPU‑centric systems.

With 2 560 Grace‑Hopper nodes and 920 Grace‑Grace nodes, Venado delivers 15.62 PFLOPS FP64 performance, 2 PB of main memory, and a total LPDDR5 bandwidth of 2.1 PB/s. The Hopper GPUs contribute 85.76 PFLOPS (vector cores) and 171.52 PFLOPS (tensor cores) FP64 performance, with 240 TB of HBM3 memory and 9.75 PB/s aggregate bandwidth.

Venado also includes a Lustre parallel storage cluster on the Slingshot network, with plans to test DeltaFS and other file systems. According to HPC manager Gary Grider, the system is now installed and expected to be accepted within two months, after which many applications will run on this experimental platform.

HPCNVLinkGrace CPULos Alamosmemory bandwidthSupercomputer
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.