DataFunSummit
Oct 5, 2024 · Artificial Intelligence
Optimizing TorchRec for Large‑Scale Recommendation Systems on PyTorch
This article details the performance‑focused optimizations applied to TorchRec, PyTorch's large‑scale recommendation system library, including CUDA graph capture, multithreaded kernel launches, pinned memory copies, and input‑distribution refinements that together achieve a 2.25× speedup on MLPerf DLRM‑DCNv2 across 16 DGX H100 nodes.
CUDA GraphGPU optimizationPyTorch
0 likes · 11 min read