Mobile Development 12 min read

How Youku Achieved Real‑Time Danmaku‑Through‑Human Rendering on Mobile Devices

This article details Youku's end‑side solution for real‑time danmaku‑through‑human rendering, covering its background, value, client‑side advantages, technical challenges, OPR architecture, processing pipeline, engineering optimizations, test results, and future outlook for mobile intelligent rendering.

Youku Technology

Jun 6, 2022

How Youku Achieved Real‑Time Danmaku‑Through‑Human Rendering on Mobile Devices

Background

Danmaku (live comments) are overlaid on video streams. The “danmaku‑through‑human” technique lets comments flow around detected human figures, preserving key visual content while still displaying comments.

Technical Requirements and Challenges

Accurate rendering : Human masks must align precisely with the subject to avoid visual artifacts.

Real‑time performance : The entire pipeline (frame capture, segmentation, blending, rendering) must complete within ~20 ms per frame.

Limited compute on mobile : CPU/GPU resources and memory are constrained compared with cloud servers.

Model size : Only lightweight segmentation models can run on‑device, which makes achieving high accuracy harder.

OPR Engine Architecture

Access Layer : Cross‑platform interfaces for Android, iOS, and Windows.

Engine Layer : Separate Video, Audio, and Danmaku engines.

Render Layer : Basic rendering plus post‑processing effects (color correction, zoom, special danmaku effects).

Intelligent Edge Layer : On‑device AI modules (face detection, speech, super‑resolution, human segmentation).

Platform Layer : Wrappers for platform‑specific graphics/audio APIs (OpenGL ES, Metal, DirectX, OpenSL, etc.).

Processing Flow

During normal video decoding, capture the current video frame.

Pass the frame to the on‑device segmentation module (PixelAI) to obtain a binary human mask.

Upload the mask as a single‑channel texture to the GPU.

Apply a 5×5 Gaussian blur on the GPU to smooth mask edges.

Use the blurred mask as an alpha channel to blend the danmaku texture.

Composite the blended danmaku surface over the video surface, producing the danmaku‑through‑human effect.

Technology Selection

OpenCV and the internal MNN framework were benchmarked but failed to meet the latency and visual‑quality targets. PixelAI, an in‑house engine already deployed in Taobao Live, delivered the required speed (≈10 ms per frame) and segmentation accuracy (IoU > 0.90) and was therefore adopted as the primary on‑device segmentation engine.

Performance Optimizations

Efficient frame capture : Integrated a lightweight snapshot filter at the end of the rendering pipeline, allowing parallel rendering and frame capture without creating or destroying filters each frame.

GPU‑based scaling : Fixed input dimensions for PixelAI models; scaling is performed on the GPU during the snapshot step, eliminating costly CPU‑side OpenCV scaling.

Asynchronous model invocation : Implemented a timeout (≈15 ms). If the segmentation result is not ready, the current mask is discarded and the previous mask is reused for a limited number of frames; after a threshold the effect is disabled to avoid visual glitches.

Effect Optimizations

Additional rendering‑level tweaks—such as multi‑pass blending, edge feathering, and dynamic mask smoothing—further improve visual smoothness and reduce flicker (see the companion article “Youku Danmaku‑Through‑Human Rendering Technology Revealed”).

Test Results

Algorithmic benchmarks of PixelAI (mobile real‑time human segmentation) report FPS ≈ 30 on flagship Android devices with memory usage < 30 MB . End‑to‑end tests on the Youku client show an average per‑frame overhead of 1.8 ms , confirming that the client‑side solution adds negligible performance impact while delivering the intended visual effect.

Future Outlook

On‑device AI is a growing trend. Extending the OPR engine with additional edge AI capabilities (e.g., scene understanding, adaptive bitrate) will enable more interactive video experiences, and danmaku‑through‑human serves as a concrete use case for large‑scale deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mobile Performance real-time Rendering AI human segmentation

Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.