How to Build an AI‑Powered Danmaku “Barrage‑Through‑People” SDK for Video Apps
This article explains the design and implementation of a flexible danmaku SDK that uses AI‑based image segmentation to let comments flow around people in videos, covering architecture, algorithmic processing, server deployment, client rendering options, performance challenges, and future extensions.
Introduction
Bilibili and many other video platforms now support danmaku (bullet‑screen comments) as a key interaction method, creating a demand for a flexible, feature‑rich danmaku component that can handle high traffic, interactive features, colored comments, and advanced effects such as "barrage‑through‑people".
Overall Architecture
The system is divided into three logical layers: data management, rendering, and the "barrage‑through‑people" module.
Danmaku Data Management Module : Provides real‑time danmaku data to the renderer, first checking a local cache and falling back to a network request. A pre‑fetch strategy reduces latency.
Danmaku Rendering Module : Includes a time engine for speed control and a track‑allocation algorithm that prevents collisions. It supports custom styles, interactive likes, role‑based comments, VIP overlays, etc.
Barrage‑Through‑People Module : Generates per‑frame mask files using AI image‑segmentation, processes them according to the video player's cropping, and renders the masks on the danmaku view at the correct timestamps.
"Barrage‑Through‑People" Technique
The technique consists of three coordinated parts: algorithm side, server side, and client SDK.
Algorithm Side
Video frame extraction at 32 fps (configurable). Higher frame rates improve mask smoothness but increase CPU load.
Model training using multi‑angle character images to build a face library that boosts detection accuracy.
Face detection on each frame, outputting contour data.
Similarity filtering: frames with >95% similarity are de‑duplicated to reduce data volume.
Server Side
Metadata management: aggregates frame‑level data, partitions it by video and time segment, and builds an index for SDK retrieval.
Merge & deduplication: combines per‑frame metadata into continuous face‑group data for efficient client consumption.
Danmaku service: supplies basic danmaku streams.
Client SDK
Rendering : Two implementations – (1) Canvas setXfermode blending to draw only the mask layer, (2) OpenGL fragment‑shader rendering using the mask texture. The Canvas approach was chosen for simplicity.
Face‑data cache : Stores an index of mask packages for quick lookup based on playback position.
Control APIs : Exposes basic danmaku control and configuration interfaces.
Service Deployment
Required environment: FFmpeg, Python 2.7, OpenCV, NumPy.
Face detection service: 2 QPS
Human‑segmentation service: 10 QPS
Offline data is stored per video in a directory named {vid}_{media_id} containing:
frame (JPEG)
humanseg (base64 JSON)
contour_png (PNG)
contour_svg (SVG)
zip (final package)
mapping (JSON index)
log (script logs)
Challenges and Solutions
Mask file size : A 2‑minute video at 32 fps produces ~3,840 masks (~100 KB each), totaling ~375 MB, which is larger than the video itself. The solution compresses binary masks into SVG files (reducing a mask to a few hundred bytes) and stores them in segmented zip archives, enabling progressive download and early rendering.
Mobile memory consumption : Each minute of masks consumes ~3 MB. On Android this caused noticeable stutter. The fix allocates a fixed‑size circular buffer for mask data, limiting memory churn and avoiding frequent garbage collection.
Future Outlook
Generating face and body data is costly (a 5‑minute video requires ~2 hours of processing). The enriched data can support advanced interactions such as person‑specific danmaku, mask‑following effects, and even real‑time face swapping. Further research will explore leveraging these datasets for new user‑engagement features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
