Alibaba Cloud Big Data AI Platform
Aug 22, 2024 · Artificial Intelligence
How RECom Accelerates Recommendation Model Inference on GPUs
The RECom compiler introduces a subgraph‑parallel fusion technique and symbolic shape handling to dramatically speed up GPU inference of deep recommendation models with massive embedding columns, achieving up to 6.61× lower latency and 1.91× higher throughput than TensorFlow baselines, while eliminating redundant computations.
GPU OptimizationRecommendation Systemscompiler
0 likes · 10 min read
