Aug 22, 2024 · Artificial Intelligence

How RECom Accelerates Recommendation Model Inference on GPUs

The RECom compiler introduces a subgraph‑parallel fusion technique and symbolic shape handling to dramatically speed up GPU inference of deep recommendation models with massive embedding columns, achieving up to 6.61× lower latency and 1.91× higher throughput than TensorFlow baselines, while eliminating redundant computations.

GPU OptimizationRecommendation Systemscompiler

0 likes · 10 min read

How RECom Accelerates Recommendation Model Inference on GPUs