Tag

Kernel Fusion

0 views collected around this technical thread.

iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 15, 2024 · Artificial Intelligence

Optimizing GPU Inference for CTR Models: Kernel Fusion, Multi‑Stream Execution, and Batch Merging

By fusing sparse‑feature operators, enabling multi‑stream execution, consolidating data copies, and merging inference batches, iQIYI reduced GPU CTR model latency to CPU‑level, boosted throughput over sixfold, and cut operational costs by more than 40%, overcoming launch‑overhead bottlenecks.

GPUKernel FusionTensorFlow
0 likes · 10 min read
Optimizing GPU Inference for CTR Models: Kernel Fusion, Multi‑Stream Execution, and Batch Merging