Artificial Intelligence 15 min read

AI‑Powered Danmu Occlusion Prevention Using Human Portrait Segmentation

The article presents a comprehensive AI solution for video danmu (bullet‑screen) occlusion prevention that leverages human portrait semantic segmentation, describes dataset construction with full‑ and semi‑supervised labeling, details the encoder‑decoder model with context attention, outlines post‑processing for spatial and temporal stability, and explains deployment on cloud and edge using YKit, KwaiNN and TensorRT.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
AI‑Powered Danmu Occlusion Prevention Using Human Portrait Segmentation

Danmu occlusion prevention aims to hide on‑screen comments that cover regions of interest (e.g., a person’s face) while preserving comments elsewhere. The Y‑tech team implements this by generating a human portrait mask for each video frame and rendering danmu only on non‑masked areas.

The pipeline consists of two stages: offline generation of a per‑frame human mask using a semantic segmentation model, followed by real‑time rendering on the client side that combines the original video with the mask.

To build a robust dataset, the team combines fully supervised pixel‑level annotations with semi‑supervised pseudo‑labels. A large labeled set (≈100 k images) is complemented by millions of pseudo‑labeled frames, improving generalization across diverse video scenes such as dance, lifestyle, and entertainment.

The segmentation model adopts an encoder‑decoder architecture with a ResNet‑18 backbone. Features are down‑sampled 16×, then up‑sampled to the original resolution. Skip connections, channel‑attention fusion, and multi‑scale supervision enhance accuracy. Contextual information is further enriched by integrating an OCRNet‑style object‑contextual representation module at the 8× feature level.

After inference, morphological post‑processing (opening then closing) cleans the mask to preserve whole‑body integrity. Temporal stability is enforced through consistency learning and frame‑wise smoothing, using Gaussian noise, affine transforms, and illumination augmentations during training, and a similarity‑based smoothing filter during inference.

For deployment, the model is exported to the YKit AI SDK, which supports iOS, Android, macOS, Windows, and Linux via native C++. The KwaiNN inference engine and NVIDIA TensorRT accelerate inference, achieving roughly 30 % speed‑up while maintaining high segmentation quality.

Finally, the mask is converted to SVG, optionally compressed with gzip, to reduce network bandwidth. The solution is already live on A‑Station and Kuaishou, and the team plans further improvements to segmentation accuracy and efficiency.

AIvideo processingsemantic segmentationhuman segmentationdanmucloud inference
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.