Tag

Hopper GPU

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 26, 2025 · Artificial Intelligence

DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference

DeepGEMM is an open‑source FP8‑precision GEMM library that delivers up to 1350 TFLOPS on NVIDIA Hopper GPUs, offering JIT‑compiled, lightweight code (~300 lines) for dense and MoE matrix multiplication, with easy deployment, configurable environment variables, and performance advantages over CUTLASS for large AI models.

AI accelerationDeepGEMMFP8
0 likes · 7 min read
DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference