Artificial Intelligence 14 min read

Efficient General‑Purpose Frame Extraction for AI Video Inference Services

The paper presents a unified, high‑performance frame‑extraction framework that dynamically selects CPU or GPU decoding, leverages multithreaded and CUDA‑accelerated pipelines, keeps frames in memory, and achieves up to ten‑fold latency reductions for diverse AI video‑inference tasks.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Efficient General‑Purpose Frame Extraction for AI Video Inference Services

AI algorithms are widely used in video‑entertainment services, but frame‑extraction latency often dominates the overall processing time. Different AI tasks have diverse frame‑extraction requirements, such as varying frame rates, format support, and hardware platforms (CPU or GPU).

The article first outlines the main challenges: (1) overall service latency and low hardware utilization, illustrated by a 1‑hour, 1080p video requiring 9 × 10⁴ frames and taking 760 s to extract frames with a 4‑core CPU; (2) heterogeneous algorithm demands and hardware deployment, where some models run on GPU and others on CPU, requiring a frame‑extraction tool that works efficiently on both.

Two general scenarios are discussed. Short‑video moderation needs ultra‑low latency and support for many codecs, while long‑video tasks (e.g., script generation, scene transition detection) require precise timestamps and high‑throughput processing.

CPU‑based solutions :

1. Traditional disk‑based extraction using FFmpeg to decode video, save JPEG frames, and let AI models read the images. This approach suffers from (a) blocking pipelines and (b) huge storage consumption (≈450 GB for a 1‑hour 1080p video).

2. In‑memory extraction where FFmpeg decodes to YUV, converts to RGB, and keeps the data in memory for direct AI consumption. A multithreaded design splits the video into segments, preserving frame order via timestamps.

CPU extraction can be accelerated by parallelizing both disk‑based and in‑memory paths, but latency remains high for high‑resolution videos.

GPU‑based solutions :

NVIDIA’s GPU frame extraction can reach >500 FPS on V100 and >1000 FPS on T4 for H.264/1080p streams, dramatically reducing latency. Limitations include fewer features than FFmpeg (no variable‑rate extraction, limited codec support), the need for CPU‑side preprocessing after GPU decoding, and data transfer overhead between GPU and CPU.

To address these gaps, the article proposes a unified framework:

Enhance GPU extraction with custom CUDA kernels to support n‑frame‑per‑second extraction, key‑frame only extraction, JPEG encoding directly on GPU (up to 3000 FPS), and accurate PTS handling.

Expose the GPU extraction functionality to Python via pybind11 , enabling existing Python‑based AI pipelines to call C++/CUDA code without extensive refactoring.

Implement a pipeline where decoded YUV data is transformed to RGB and pre‑processed entirely on GPU, minimizing CPU‑GPU data copies.

Use ffprobe to detect video codec and dynamically select CPU or GPU extraction.

The implementation includes detailed optimizations for both CPU and GPU paths, such as multithreaded segment processing, precise PTS extraction, and asynchronous stage execution. Performance gains reported are up to 10× faster for short‑video moderation and over 10× improvement for long‑video script generation, with the best results achieved when frames remain in GPU memory and are consumed directly by AI models.

In conclusion, a flexible, high‑performance frame‑extraction tool that adapts to diverse AI video‑inference workloads can significantly reduce service latency, improve hardware utilization, and support a wide range of algorithmic requirements.

GPU AccelerationCPU optimizationvideo processingFFmpegAI video inferenceframe extraction
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.