What Makes HPE Cray’s New EX Supercomputers a Game‑Changer for AI and HPC?
The article provides an in‑depth analysis of HPE’s latest Cray EX supercomputing platforms, detailing their GPU density, performance benchmarks, liquid‑cooling architecture, Slingshot 400 interconnect, upcoming storage solutions, and alternative ProLiant Compute XD servers for AI workloads.
HPE Cray EX Platforms
The EX154n accelerator blade is designed for extreme GPU density. Each rack can host up to 224 Nvidia Blackwell GPUs and 8,064 Grace CPU cores , delivering more than 10 petaFLOPS FP64 and 4.4 exaFLOPS FP4 for sparse AI and machine‑learning workloads.
Each EX154n blade contains two Grace‑Blackwell super‑chips (GB200) . A super‑chip integrates two Blackwell GPUs and a 72‑core Arm CPU, and the two super‑chips are linked via Nvidia’s NVL4 reference interconnect.
Power consumption exceeds 300 kW per rack , requiring liquid cooling. The system is fan‑less and uses the next‑generation Slingshot 400 Ethernet NICs, cables, and switches, providing 400 Gbps bandwidth (double the previous 200 Gbps).
Timeline
EX154n accelerator blades and Slingshot 400 interconnect: ship by end of 2025.
EX4252 Gen 2 CPU‑centric blade: equipped with eight 192‑core Turin‑C processors (total 98,304 cores ), slated for spring 2025.
Storage and Networking Enhancements
The upgraded E2000 storage system uses PCIe 5.0 NVMe devices, delivering more than twice the I/O performance of prior generations. It runs the open‑source Lustre file system to reduce idle I/O time.
The new Slingshot Interconnect 400 offers line‑rate 400 Gbps links, automatic congestion management, and adaptive routing to minimize latency for any workload.
ProLiant Compute XD Servers
These servers target AI training and inference with a range of accelerator options.
XD688 : liquid‑cooled; supports either eight Nvidia H200 SXM Tensor‑Core GPUs or eight Blackwell GPUs; shipping early 2025.
XD685 : optional configuration with eight AMD Instinct MI325X accelerators and two AMD EPYC CPUs; release announced for Q4 2024 with broader availability in Q1 2025.
XD680 : cost‑effective, equipped with eight Intel Gaudi 3 AI accelerators (total 1 TB HBM2e ); scheduled for release next month.
All XD models include Integrated Lights‑Out (iLO) for remote management.
Accelerator Performance Details
Intel Gaudi 3: each accelerator delivers ~ 1.8 petaFLOPS BF16 , providing high compute density for data‑intensive workloads.
Nvidia H200: eight GPUs provide ~ 1.1 TB HBM3e memory.
AMD Instinct MI325X: up to 2 TB HBM3e memory per node.
Deployment Considerations
The high power draw and liquid‑cooling requirements of the Cray EX systems limit their deployment to specialized data‑center environments. In contrast, the ProLiant Compute XD line offers fan‑cooled, more flexible configurations suitable for a broader range of enterprises.
Reference: https://www.theregi ster.com/2024/11/13/hpe_cray_ex/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
