Artificial Intelligence 18 min read

AI-Powered Masked Danmaku: Design and Implementation

This article details the design and practical implementation of an AI-driven masked danmaku system that prevents comment overlay on video content, covering background, technology selection, instance segmentation methods, distributed task scheduling, mask generation, client rendering, performance optimizations, and future directions.

HomeTech
HomeTech
HomeTech
AI-Powered Masked Danmaku: Design and Implementation

Background

Danmaku (bullet comments) appear over video and can enhance interaction, but dense comments obscure the video. To keep the fun of danmaku while preserving viewing experience, a mask‑based solution is proposed that lets comments avoid human regions in the video.

Technical Research and Selection

Video Frame Extraction

Frames are extracted using FFmpeg via the PyAV wrapper, which provides flexible decoding and supports common data formats such as numpy arrays.

Human Region Detection

Instance segmentation (a computer‑vision technique) is employed to identify human regions in each frame.

AI Framework Choice

Among TensorFlow and PyTorch, PyTorch was selected for its lower learning curve, ease of use, and rich pre‑trained model ecosystem.

Instance Segmentation Algorithm

After comparing Mask R‑CNN, YOLACT and BlendMask, BlendMask was chosen for its superior accuracy and 20 % speed advantage over Mask R‑CNN.

Open‑Source Detection Projects

Various detection libraries were evaluated (Detectron, maskrcnn‑benchmark, Detectron2, MMDetection, SimpleDet, Tensorflow Object Detection). Detectron2 was adopted because it is PyTorch‑based, FAIR‑maintained, modular, and already supports BlendMask via the AdelaiDet component.

Mask Storage Format

Human‑region masks are stored as SVG vector graphics to retain quality at any scale and keep file size minimal. Multiple SVGs are compressed and packaged for storage in the FastDFS file system.

Client Rendering

The front‑end renders masks using CSS3 mask-image on the danmaku layer, achieving the “mask‑danmaku” effect.

Mask Generation Design

A distributed task system is built to handle large video volumes efficiently. Videos are split into time‑based segments, each becoming a mask‑generation task processed in parallel across multiple machines.

Task Production

The producer analyzes video length, divides it (e.g., 0‑10 min, 11‑20 min), and stores task metadata (video URL, start/end times) in a database.

Task Scheduling

The scheduler dispatches tasks, monitors execution, recovers stalled tasks, and sends SMS alerts when anomalies occur.

Task Consumption

Consumers retrieve tasks, extract the relevant video segment with FFmpeg , read frames via PyAV , run Detectron2 instance segmentation, filter low‑confidence or unsuitable masks, generate PNG masks, convert them to SVG using potrace (later replaced by pypotrace ), pack SVGs with timestamps into a binary file, upload to FastDFS, and report the file URL back to the scheduler.

Optimization

CUDA Memory Management

PyTorch’s caching allocator can retain GPU memory; invoking torch.cuda.empty_cache() after processing ~100 images releases memory, reducing usage from ~15 GB to ~900 MB.

Prediction Speed

Switching from Mask‑RCNN to BlendMask, resizing input images to ≤320 px, and upgrading hardware from Nvidia K80 to V100 cut per‑frame inference from >200 ms to ~35 ms.

PNG Generation

By using only the pred_masks field from Detectron2 results and skipping unnecessary visualisation steps, PNG creation time dropped from >130 ms to ~1 ms.

SVG Conversion

Replacing the original potrace pipeline with a custom pypotrace implementation reduced SVG conversion from ~80 ms per image to ~1 ms.

m3u8 Seek Issue

Handling EXT‑X‑DISCONTINUITY tags required tracking the last PTS before discontinuity and adjusting subsequent timestamps, ensuring reliable seeking in long HLS streams.

Conclusion

The article presents an end‑to‑end AI‑driven masked danmaku solution, covering background, technology evaluation, system architecture, mask generation workflow, client rendering, and multiple performance optimizations. The authors hope the experience and lessons learned benefit others building similar video‑overlay systems.

distributed systemscomputer visionAIvideo processinginstance segmentationMask Danmaku
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.