Tagged articles
2 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Oct 5, 2024 · Artificial Intelligence

Optimizing TorchRec for Large‑Scale Recommendation Systems on PyTorch

This article details the performance‑focused optimizations applied to TorchRec, PyTorch's large‑scale recommendation system library, including CUDA graph capture, multithreaded kernel launches, pinned memory copies, and input‑distribution refinements that together achieve a 2.25× speedup on MLPerf DLRM‑DCNv2 across 16 DGX H100 nodes.

CUDA GraphDistributed TrainingGPU Optimization
0 likes · 11 min read
Optimizing TorchRec for Large‑Scale Recommendation Systems on PyTorch
DataFunTalk
DataFunTalk
Apr 3, 2023 · Artificial Intelligence

Large‑Scale Recommendation System Training with TorchRec and Dynamic Embedding

This article explains how Tencent’s AI team leverages the PyTorch‑based TorchRec library and a custom dynamic embedding solution to train billion‑scale recommendation models efficiently, detailing the benefits of TorchRec, GPU embedding, optimized kernels, embedding partition strategies, experimental results, and practical deployment guidance.

GPU EmbeddingLarge-Scale TrainingPyTorch
0 likes · 15 min read
Large‑Scale Recommendation System Training with TorchRec and Dynamic Embedding