Technical Overview of Bilibili Vision Toolkit (BVT): Architecture, Features, and FFmpeg Filter Integration
The Bilibili Vision Toolkit (BVT) is a C++ SDK that unifies multimedia AI algorithms through a low‑coupling core, modular dynamic libraries, and a multi‑engine backend, enabling configurable DAG pipelines, asynchronous parallel execution, and seamless FFmpeg filter integration for high‑performance, cross‑platform video processing.
The article introduces Bilibili Vision Toolkit (BVT), a C++‑based SDK that consolidates various multimedia AI algorithms (e.g., super‑resolution, face enhancement, video frame interpolation, narrow‑band HD) and provides a unified API for backend integration. BVT serves as an engineering "base" for AI inference and video processing pipelines, enabling high performance, heterogeneous computing, and multi‑platform support.
2 BVT Technical Analysis
BVT is organized into a low‑coupling core layer and a modular layer, with a backend engine layer that abstracts multiple inference engines. The core layer handles task scheduling and provides C API entry points, while the modular layer implements concrete AI algorithms as dynamic libraries loaded at runtime. This design promotes code reuse, extensibility, and easy configuration of custom task graphs.
2.1 Overall Architecture and Workflow
The system consists of an application layer (e.g., a FFmpeg filter), the core layer, the modular layer, and the engine layer. The application invokes BVT APIs to request tasks such as super‑resolution or face enhancement. The core layer schedules these tasks, loads the appropriate modules, and delegates inference to the engine layer (TensorRT, LibTorch, OpenVINO, etc.).
2.4 Custom Task Flow
BVT allows users to define custom pipelines via configuration files. The pipeline is represented as a Directed Acyclic Graph (DAG) and is executed by a built‑in graph engine combined with a thread pool for parallel processing. An example pipeline processes an input image by detecting a ROI, running face‑enhancement and super‑resolution in parallel, and finally applying color correction.
2.3 Data Representation
Data exchanged between modules is encapsulated in a Packet whose payload is a Tensor . Tensors are abstracted to support various tensor libraries (LibTorch, Eigen) and device buffers (CPU, CUDA). Memory management uses pooling and reference counting to reduce allocation overhead and avoid unnecessary copies.
2.4 Multi‑Inference Engine Support
BVT abstracts inference through a unified interface, supporting engines such as TensorRT, LibTorch, OpenVINO, OnnxRuntime, and TensorFlow. Model files are packaged with a model.json descriptor that specifies the required engine, version, and I/O signatures, enabling runtime engine selection and dynamic plugin loading.
2.5 Module Decoupling
Modules are built as independent dynamic libraries, allowing applications to link only the lightweight core static library. At runtime, the core loads the needed modules and corresponding inference engines, which is especially useful for differentiating VOD (large module set) and live (small, low‑latency module set) scenarios.
2.6 Asynchronous Parallel Execution
BVT adopts an asynchronous, non‑blocking API. Requests are submitted without waiting for completion; the caller polls request status. The framework also includes a device scheduler that distributes work across multiple GPUs, achieving multi‑device parallelism.
2.7 API Design
The C API revolves around three concepts: session, task, and request. A session can host multiple tasks; each task can submit multiple requests. The API maintains state across these contexts, which is essential for streaming video processing.
2.8 FFmpeg Filter Implementation
The article provides a concrete example of integrating BVT as an FFmpeg filter using the activate() callback for asynchronous processing. The typical workflow includes:
init(): call bvt_env_create() to set up the environment and load module libraries.
query_formats(): negotiate pixel formats.
config_props(): parse filter parameters, create a BVT session and task via bvt_session_create() and bvt_task_create() .
activate(): for each input frame, wrap buffers with bvt_buffer_create() , submit a request with bvt_request_create() , poll with bvt_request_poll() , and forward completed frames.
uninit(): clean up with bvt_task_destroy() , bvt_session_destroy() , and bvt_env_destroy() .
Filter parameter example:
bvt=module={module_path}:task={task_name}:model={model_dir}:gpus={gpu_list}:cuda_in={bool}:cuda_out={bool}:task_opt='{'task_specific_params'}'The article concludes with a summary of BVT’s impact on Bilibili’s VOD and live streaming services, emphasizing improved development efficiency, runtime performance, and extensibility across devices and inference engines.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.