Bilibili's AI-Powered Video Frame Interpolation: Techniques, Challenges, and Deployment
Bilibili’s AI‑driven frame‑interpolation pipeline upgrades low‑frame-rate videos to smooth high‑frame-rate 1080p playback by optimizing optical‑flow models for large motion, texture and text artifacts, pruning for speed, and deploying via the BVT SDK across on‑demand and live streams.
1. Introduction
Viewers of historical dramas on Bilibili have noticed that the classic series Yongzheng Dynasty has been upgraded from 540p@25fps to 1080p@50fps. The visual improvement is driven by Bilibili's self‑developed video frame‑interpolation algorithm.
2. Background
With the rapid growth of Bilibili’s user base and the increasing performance of playback devices, low‑frame‑rate videos no longer meet user expectations for smooth motion. Bilibili therefore built a cloud‑based video frame‑interpolation pipeline that converts low‑frame‑rate content to high‑frame‑rate using AI models, aiming to enhance playback smoothness for on‑demand videos.
3. Existing Techniques
Frame interpolation can be achieved by three common methods:
Copying the previous or next frame (duplicate frame).
Blending the two neighboring frames (mixed frame).
Estimating optical flow between the two frames and warping to synthesize intermediate frames (optical‑flow‑based interpolation).
Optical‑flow‑based methods, such as SuperSlomo, DAIN, RIFE, and IFRNet, produce the most visually pleasing results but are computationally intensive.
4. Technical Challenges and Optimizations
The following defects are typical in optical‑flow‑based interpolation and were addressed in Bilibili’s solution.
4.1 Large‑motion artifacts
Small models with limited receptive fields struggle to model large motions, leading to “broken limbs” in fast‑moving subjects. The solution is to adapt the interpolation timestamp (t) based on motion magnitude, inserting frames closer to the source frames (e.g., t=0.2) when large motion is detected, using motion vectors from the decoder as a cue.
4.2 Repeating/periodic texture artifacts
Periodic textures cause erroneous optical flow, producing blocky shifts. Instead of increasing model size, Bilibili enriched the training set with synthetic repeating‑texture sequences and later disabled supervised optical‑flow supervision, eliminating the artifact without extra compute.
4.3 Text distortion
Static foreground text over moving backgrounds is poorly handled by open‑source models. By adding artificially generated text‑over‑motion data and extensive augmentation, the model learns to preserve text shape, removing distortion.
4.4 Inference speed
Model pruning, low‑resolution flow inference, and other tricks reduced FLOPs to 13 G for 1080×1920 video with 0.5× flow size, achieving up to 165 fps on an RTX 3090.
5. Engineering and Deployment
Bilibili built the BVT (Bilibili Vision Toolkit) SDK, which streamlines model validation, provides efficient memory management, and supports multi‑process, multi‑GPU inference. BVT fully supports the frame‑interpolation pipeline.
6. Demonstrations
The algorithm can double frame rate (2×) and also achieve higher multipliers (e.g., 4×) for 25 fps source video, delivering smooth high‑frame‑rate playback.
7. Live‑Streaming Applications
Beyond on‑demand content, the technology is explored for live streaming where many broadcasters cannot stream at 60 fps. Real‑time interpolation on GPUs such as RTX 3060 enables 1080p@30 fps streams to be upgraded to 1080p@60 fps without noticeable latency.
8. Conclusion and Outlook
The frame‑interpolation algorithm is a key component of Bilibili’s video quality matrix, improving user experience across VOD, live, and short‑form video. Ongoing work focuses on further speed improvements, subjective quality enhancements, and broader deployment across various multimedia enhancement modules.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.