RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism

RecFlow, a new inference engine from Beijing University of Posts and Telecommunications and Meituan, tackles the resource mismatch of DLRM models by coordinating embedding and DNN operators at the intra‑SM level and introducing interference‑aware adaptive scheduling and incremental batching, achieving up to 9.34× higher throughput on RTX 3090.

DLRMFine-grained parallelismGPU Acceleration

0 likes · 7 min read

RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism