AI-Based Follow-Subtitle (Bullet) System for Video Streaming

The article presents an AI‑driven follow‑subtitle system for video streaming that uses server‑side face detection and tracking to attach speech‑bubble bullets to characters, synchronizing trajectories with playback via a client SDK, while addressing cut‑scene handling, latency, and power constraints.

Youku Technology
Youku Technology
Youku Technology
AI-Based Follow-Subtitle (Bullet) System for Video Streaming

This article introduces a new interactive subtitle (bullet) feature for video streaming platforms, where subtitles follow characters on screen using AI-powered face recognition. Unlike traditional top‑down scrolling bullets, the follow‑bullet appears as a speech‑bubble attached to a character’s face and moves with the character.

The system architecture consists of three layers: algorithm side, server side, and client side. The algorithm side extracts video frames at 25 fps, performs face detection, tracking, and smoothing to generate per‑frame face metadata. The server aggregates this metadata, applies denoising, anti‑shake processing, and merges frame‑level data into continuous face trajectories, then packages these trajectories together with the corresponding bullet data.

On the client, an interactive SDK loads scripts that represent small interactions. Each script contains a face trajectory and its associated bullet bubble data. A timer polls the current playback time, retrieves the relevant face coordinates, and renders the bullet bubble next to the moving face, achieving a seamless follow‑bullet effect.

The article also explains why face recognition is performed on the server rather than the client: real‑time constraints, insufficient accuracy on mobile devices, and excessive CPU/power consumption would degrade user experience.

Key challenges discussed include handling bullets that appear just before a scene cut (preventing awkward lingering after the cut) and synchronizing face data with video edits that cause timeline misalignment. Solutions involve applying fade‑out effects for cut‑scene bullets and calculating precise cut durations using short segments of face data.

Future work aims to extend the basic script framework beyond follow‑bullets to other interactive features such as bullet‑through‑people, leveraging the same face and body data. This will enable developers to create custom scripts by inheriting the optimized face data pipeline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

algorithmReal-time ProcessingAIface recognitioninteractive subtitles
Youku Technology
Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.