Video Stutter Detection via Frame Difference Analysis Using FFmpeg
This article explains a method for detecting video stutter by converting uploaded videos into frame sequences with ffmpeg, calculating pixel differences between consecutive frames, aggregating motion metrics, removing scene‑change effects, computing a dynamic factor, and outputting a binary result indicating the presence or absence of stutter.
In video quality assessment, detecting stutter is a crucial metric. This guide describes how to determine whether a video contains stutter by converting the video into a sequence of frames and analyzing the differences between consecutive frames.
The overall solution is divided into six parts: image processing, adjacent‑frame pixel calculation, motion amount aggregation, elimination of scene‑change influence, dynamic factor computation, and result output.
Image processing : The uploaded video is processed with ffmpeg to extract a grayscale image sequence resized to 360×640, focusing on a predefined region of interest to reduce noise.
Adjacent‑frame calculation : For each pair of consecutive frames (t and t+1), pixel differences are computed. A constant threshold of 30 distinguishes motion pixels (difference > 30) from static pixels.
Motion amount calculation : The differences are squared to convert amplitude to energy, and the average energy per frame is calculated, yielding the TI2 value for each frame.
Eliminate scene‑change ratio : Before averaging, a scene‑change proportion of 0.02 is removed, and the TI2 values are sorted to discard extreme noise values, producing a stable average TI2 (TI2_AVG).
Dynamic factor : The dynamic factor Dfact is computed as Dfact = a + b × log(TI2_AVG) with constants a = 2.5, b = 1.25, and limited by c = 0.1. Dfact is clamped to the range [0, 0.1].
Result output : Each frame’s TI2 is compared with Dfact × Mdrop (Mdrop = 0.015). If TI2 ≤ Dfact × Mdrop, the frame is marked as stutter (1); otherwise, it is marked as normal (0). The final output is a binary list indicating stutter presence for the entire video.
Advantages of this approach include: no need for large training datasets, robustness against varied dynamic/static scenes, and low computational overhead with high precision.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.