Oct 17, 2019 · Artificial Intelligence

GPU-Accelerated Model Training Optimizations for Snowball Feed Recommendation System

This article describes the challenges of large‑scale model training for Snowball’s feed recommendation, and details a series of engineering optimizations—including GPU acceleration, multi‑threaded data preparation, TFRecord conversion, compression, and batch‑map reordering—that increased training throughput from 6 k to over 20 k samples per second while reducing CPU and I/O bottlenecks.

GPUModel TrainingTFRecord

0 likes · 15 min read