Design and Implementation of a High‑Performance Matroska Demuxer for Web Uploads
The new mkv-demuxer SDK replaces the slow FFmpeg-Wasm solution on Bilibili’s upload page by reading Matroska files in slice-sized ArrayBuffers, parsing EBML headers and SeekHead indexes, and exposing getMeta, getData, and seekFrame APIs, cutting memory use by 98 % and parsing time by 97 % while accelerating cover-generation and recommendation processing.
Matroska is an open, flexible multimedia container format that can hold multiple video, audio, and subtitle streams. On Bilibili's web upload page, Matroska videos account for over 2% of uploads, making efficient parsing essential.
The original solution used FFmpeg compiled to WebAssembly via Emscripten, which supports many formats but suffers from slow parsing speed, high memory consumption, and lack of hardware acceleration.
For MP4, the upload page has already switched to mp4box for demuxing and WebCodecs for decoding, achieving a 70% efficiency gain. To similarly improve Matroska handling, a new demuxing approach based on WebCodecs is required.
Technical research shows that Matroska is built on EBML, consisting of an EBML Header and a Segment containing up to eight top‑level elements such as SeekHead, Info, Tracks, Cues, and Cluster. The SeekHead provides indexes for fast random access.
Existing open‑source projects include jswebm, which can parse WebM (a Matroska‑based format) but loads the entire file into memory, leading to high memory usage and lack of APIs for selective metadata or frame extraction.
Therefore a new SDK, mkv-demuxer , was created. Its design focuses on:
Reading files by reference and fetching only needed ArrayBuffer slices, avoiding full file loading.
Parsing EBML Header, then Segment, prioritizing SeekHead to record positions of top elements.
Providing APIs to obtain video metadata (getMeta), all packet data (getData), and seek to a specific frame (seekFrame).
Example usage:
import MkvDemuxer from 'mkv-demuxer'
const demuxer = new MkvDemuxer()
const filePieceSize = 1 * 1024 * 1024
await demuxer.initFile(file, filePieceSize)
const meta = await demuxer.getMeta()
const data = await demuxer.getData()
const frame = await demuxer.seekFrame(10)The getMeta API returns an object containing container info, video track codec, resolution, etc.; getData returns arrays of video and audio packets with timestamps; seekFrame returns the nearest keyframe packet for a given timestamp.
Performance tests on a 4K VP9 video (1.61 GB) show that the new SDK reduces memory usage by 98.34% and parsing time by 97.21% compared with the FFmpeg+Wasm solution.
In the web upload workflow, faster metadata extraction and frame sampling improve AI‑driven cover and category recommendations, shortening total processing time by up to 21% for high‑resolution videos.
Future work includes extending mkv-demuxer to parse Matroska tags, attachments, and EBML streams, and integrating the SDK into the edge‑transcoding pipeline to provide early bitrate and size calculations.
References: EBML RFC 8794, Matroska specifications, WebM project, and related npm packages (mp4box, jswebm, mkv-demuxer).
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.