Network Intelligence Research Center (NIRC)
Jun 27, 2023 · Artificial Intelligence
Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference
REEF introduces a reset‑based preemption mechanism and dynamic kernel padding to achieve microsecond‑scale GPU kernel preemption, enabling concurrent real‑time and best‑effort DNN inference with only 2 % added latency for real‑time tasks while boosting overall throughput by up to 7.7×, as demonstrated on the DISB benchmark.
DNN inferenceGPU schedulingREEF
0 likes · 9 min read
