Network Intelligence Research Center (NIRC)
Jan 25, 2026 · Artificial Intelligence
RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism
RecFlow, a new inference engine from Beijing University of Posts and Telecommunications and Meituan, tackles the resource mismatch of DLRM models by coordinating embedding and DNN operators at the intra‑SM level and introducing interference‑aware adaptive scheduling and incremental batching, achieving up to 9.34× higher throughput on RTX 3090.
DLRMFine-grained parallelismGPU acceleration
0 likes · 7 min read
