Tagged articles
1 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 22, 2024 · Artificial Intelligence

How RECom Accelerates Recommendation Model Inference on GPUs

The RECom compiler introduces a subgraph‑parallel fusion technique and symbolic shape handling to dramatically speed up GPU inference of deep recommendation models with massive embedding columns, achieving up to 6.61× lower latency and 1.91× higher throughput than TensorFlow baselines, while eliminating redundant computations.

GPU OptimizationRecommendation Systemscompiler
0 likes · 10 min read
How RECom Accelerates Recommendation Model Inference on GPUs