Artificial Intelligence 14 min read

Bilibili's AI-Powered Video Frame Interpolation: Techniques, Challenges, and Deployment

Bilibili’s AI‑driven frame‑interpolation pipeline upgrades low‑frame-rate videos to smooth high‑frame-rate 1080p playback by optimizing optical‑flow models for large motion, texture and text artifacts, pruning for speed, and deploying via the BVT SDK across on‑demand and live streams.

Bilibili Tech

Dec 15, 2023

Bilibili's AI-Powered Video Frame Interpolation: Techniques, Challenges, and Deployment

1. Introduction

Viewers of historical dramas on Bilibili have noticed that the classic series Yongzheng Dynasty has been upgraded from 540p@25fps to 1080p@50fps. The visual improvement is driven by Bilibili's self‑developed video frame‑interpolation algorithm.

2. Background

With the rapid growth of Bilibili’s user base and the increasing performance of playback devices, low‑frame‑rate videos no longer meet user expectations for smooth motion. Bilibili therefore built a cloud‑based video frame‑interpolation pipeline that converts low‑frame‑rate content to high‑frame‑rate using AI models, aiming to enhance playback smoothness for on‑demand videos.

3. Existing Techniques

Frame interpolation can be achieved by three common methods:

Copying the previous or next frame (duplicate frame).

Blending the two neighboring frames (mixed frame).

Estimating optical flow between the two frames and warping to synthesize intermediate frames (optical‑flow‑based interpolation).

Optical‑flow‑based methods, such as SuperSlomo, DAIN, RIFE, and IFRNet, produce the most visually pleasing results but are computationally intensive.

4. Technical Challenges and Optimizations

The following defects are typical in optical‑flow‑based interpolation and were addressed in Bilibili’s solution.

4.1 Large‑motion artifacts

Small models with limited receptive fields struggle to model large motions, leading to “broken limbs” in fast‑moving subjects. The solution is to adapt the interpolation timestamp (t) based on motion magnitude, inserting frames closer to the source frames (e.g., t=0.2) when large motion is detected, using motion vectors from the decoder as a cue.

4.2 Repeating/periodic texture artifacts

Periodic textures cause erroneous optical flow, producing blocky shifts. Instead of increasing model size, Bilibili enriched the training set with synthetic repeating‑texture sequences and later disabled supervised optical‑flow supervision, eliminating the artifact without extra compute.

4.3 Text distortion

Static foreground text over moving backgrounds is poorly handled by open‑source models. By adding artificially generated text‑over‑motion data and extensive augmentation, the model learns to preserve text shape, removing distortion.

4.4 Inference speed

Model pruning, low‑resolution flow inference, and other tricks reduced FLOPs to 13 G for 1080×1920 video with 0.5× flow size, achieving up to 165 fps on an RTX 3090.

5. Engineering and Deployment

Bilibili built the BVT (Bilibili Vision Toolkit) SDK, which streamlines model validation, provides efficient memory management, and supports multi‑process, multi‑GPU inference. BVT fully supports the frame‑interpolation pipeline.

6. Demonstrations

The algorithm can double frame rate (2×) and also achieve higher multipliers (e.g., 4×) for 25 fps source video, delivering smooth high‑frame‑rate playback.

7. Live‑Streaming Applications

Beyond on‑demand content, the technology is explored for live streaming where many broadcasters cannot stream at 60 fps. Real‑time interpolation on GPUs such as RTX 3060 enables 1080p@30 fps streams to be upgraded to 1080p@60 fps without noticeable latency.

8. Conclusion and Outlook

The frame‑interpolation algorithm is a key component of Bilibili’s video quality matrix, improving user experience across VOD, live, and short‑form video. Ongoing work focuses on further speed improvements, subjective quality enhancements, and broader deployment across various multimedia enhancement modules.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Processing AI Deep Learning multimedia optical flow Video Frame Interpolation

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.