Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference

REEF introduces a reset‑based preemption mechanism and dynamic kernel padding to achieve microsecond‑scale GPU kernel preemption, enabling concurrent real‑time and best‑effort DNN inference with only 2 % added latency for real‑time tasks while boosting overall throughput by up to 7.7×, as demonstrated on the DISB benchmark.

DNN inferenceGPU schedulingREEF

0 likes · 9 min read

Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference