Artificial Intelligence 13 min read

Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough

To enable real‑time bullet‑comment passthrough on Youku’s mobile app, the team built a million‑scale portrait dataset and designed the AirSegNet series—CPU, GPU, and server variants—using VGG‑style nets, edge‑aware losses, and hybrid CPU‑GPU inference, achieving 0.98 IoU and sub‑15 ms latency on most devices.

Youku Technology
Youku Technology
Youku Technology
Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough

As video platforms introduce bullet comment passthrough features, Youku (Alibaba's video platform) needed mobile-side human portrait segmentation technology to complement their server-side solution. This article presents a comprehensive solution for real-time portrait segmentation deployed on Youku's mobile app for bullet comment passthrough.

1. Business Background

While server-side segmentation offers stable quality and high accuracy, it incurs storage and bandwidth costs and cannot meet real-time requirements for trending videos. This drove the need for mobile-side portrait segmentation with both high accuracy and real-time performance.

2. Salient Portrait Segmentation

The solution focuses on segmenting salient human figures in videos—areas in focus—rather than background or non-focused regions. The team built a million-scale dataset covering modern urban dramas, historical dramas, and military content, with various poses including half-body, full-body, single-person, and multi-person scenarios. Special edge cases like backlight, low-light, and reaching gestures were also collected.

3. Model Design - AirSegNet Series

Through extensive experiments on Alibaba's MNN mobile inference framework, the team found that VGG-style straight network designs performed best. They developed the AirSegNet series with three variants: AirSegNet-CPU, AirSegNet-GPU, and AirSegNet-Server. Key design innovations include using 1x1 convolutions in the decoder to reduce computation, fusing multi-scale low-level features (x2, x4, x8), and setting align_corners=False for more accurate edge segmentation.

4. Training Optimization

Multiple effective strategies were employed: (1) Background weight calculation to address misclassification; (2) Edge weighting using 5x5 kernel dilation and erosion to identify edges with 5x loss weight; (3) A novel clustering loss to improve segmentation accuracy; (4) TopK loss for handling hard negative samples. The final portrait IOU reached 0.98.

5. Post-Processing Optimization

To address aliasing artifacts at different resolutions: (1) Edge optimization using 3x3 Gaussian blur fused into the network, plus curve transformation to reduce transition areas; (2) Momentum-based frame-to-frame jitter suppression with adaptive thresholds; (3) Frame skipping for stable scenes to reduce computation.

6. Engineering Deployment

Integrated with Alibaba's PixelAI SDK, the solution was tested across Android and iOS devices. Key optimizations included: (1) Dual CPU+GPU model initialization to solve GPU initialization latency; (2) CPU+GPU hybrid model distribution based on device capability. The solution achieves under 15ms inference time for 90%+ of devices, enabling seamless bullet comment passthrough experience.

computer visionmodel optimizationedge computingsemantic segmentationreal-time inferenceMNN FrameworkMobile Deep LearningPortrait Segmentation
Youku Technology
Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.