Frontend Development 12 min read

Efficient Video Frame Extraction Using WebCodecs on Bilibili's Web Upload Page

Bilibili’s web upload page now uses the browser‑native WebCodecs API to decode video frames client‑side, replacing slower Canvas and Wasm‑FFmpeg pipelines, achieving 2.5–5× faster cover‑frame extraction with lower memory use, while supporting MP4/WebM and improving user experience.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Efficient Video Frame Extraction Using WebCodecs on Bilibili's Web Upload Page

Author: Zhang Feng, senior development engineer at Bilibili.

Business introduction: The web upload page of Bilibili is a major source of video submissions. The cover selection step is time‑consuming, so an automatic cover capture and recommendation feature was introduced. The feature relies on extracting video frames on the client side to avoid server‑side latency.

Typical frame‑extraction scenarios on the web upload page:

Cover recommendation – capture multiple low‑resolution frames, score them with AI, and present up to ten high‑resolution candidates.

Cover frame selection – manually pick a precise timestamp.

Category & topic recommendation – capture several frames, upload them for backend analysis.

Previous solutions:

Two parallel approaches were used: a Canvas‑based fallback and a WebAssembly (Wasm) + FFmpeg pipeline.

Canvas method: use the <video> element, set the playback time, draw the current frame onto a 2‑D canvas with drawImage() , and obtain the image.

Wasm + FFmpeg method: demux the video file, read key‑frame data, transfer it through several layers, and render with WebGL or Canvas. This approach supports almost all video formats but suffers from high CPU/memory usage, large binary size, and a steep development curve.

Current performance: frame‑extraction success rate ≈ 97 %. Average latency ≈ 8.4 s, 50th percentile 16 s, 90th percentile 19 s.

What is WebCodecs?

WebCodecs, released in September 2021, provides low‑level APIs for audio/video encoding and decoding directly in the browser. It targets developers with a solid background in media processing and has a higher entry barrier for typical front‑end engineers.

MP4 basics covered:

Encoding vs. decoding – compression of raw images into formats such as H.264, H.265, VP9.

Container (muxing) – aggregation of audio and video streams into boxes (e.g., moov , mdat ).

Intra‑frame (I‑frame) compression illustrated with JPEG‑like steps.

Inter‑frame compression – motion compensation and frame‑difference (P/B frames).

WebCodecs frame‑extraction workflow (four steps):

Metadata reading & parsing: Read the first 8 bytes of the file, locate each box, and extract the moov box to obtain codec parameters and frame indexes.

Seeking: Because timestamps are not continuous, find the nearest key‑frame or non‑key‑frame to the requested time.

Decoding: Feed the selected file chunk to VideoDecoder . Providing only a key‑frame yields faster results for low‑precision use‑cases (cover recommendation); providing non‑key‑frames gives higher precision (manual frame selection).

Rendering: Render the decoded frame with Canvas or WebGL. The heavy demuxing, decoding, and rendering work is off‑loaded to a Web Worker to keep the UI responsive.

Performance analysis:

Local tests on a 2020 M1 MacBook Pro and a Windows i5‑1135G7 machine compared three methods (WebCodecs, Wasm + FFmpeg, Canvas) across videos of various resolutions (720p, 1080p, 2K, 4K). Results show that WebCodecs is 2.5–5× faster than the other methods, reduces memory consumption, and completes the frame‑extraction task 3–13 seconds earlier, allowing users to see cover recommendations sooner.

Advantages of the WebCodecs solution:

High speed, less affected by video resolution.

Reduced file reads.

Lower and more stable memory usage.

Disadvantages:

Relies on the demuxer implementation (currently mp4box.js covers ~95 % of videos).

Full support only for MP4 and WebM containers.

Browser support for WebCodecs is around 85 %.

Future plans:

Make WebCodecs the default frame‑extraction method on the web upload page and continue to optimize based on online metrics.

Extend support to additional container formats (e.g., WebM, with an open‑source MKV demuxer).

Open‑source the implementation.

Appendix: links to JPEG compression, video codec tutorials, frame‑type explanations, codec string specifications, MP4 container details, online MP4 parsers, official WebCodecs documentation, and sample code repositories.

PerformancefrontendWebAssemblyWebCodecsMP4Video Frame Extraction
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.