Intelligent Video Content Production and Automated Editing Platform – Technical Overview
Alibaba Youku built Milan ZhiYun platform that uses AI video understanding to automatically extract highlights, generate covers, and reassemble fragments from long‑form videos, leveraging scene detection, beautification, bullet‑comment analysis, and subjective quality evaluation to dramatically improve editing efficiency and user engagement.
In the context of overall video consumption, content creation is the source of all video traffic. The question of how to produce content intelligently and automatically was addressed by Alibaba Youku senior technical expert Zhang Lei at this year’s Cloud Conference.
He explained that the streaming media industry faces two core problems: “what to play” and “how to play”. While the latter concerns delivery technology, the former concerns content supply. High‑quality, exclusive dramas are scarce, and most platforms rely on non‑exclusive content, leading to content homogeneity and weak user stickiness. The rise of short‑form video, with its fragmented content, better matches diverse user needs.
To tackle these challenges, Youku built Milan ZhiYun , an intelligent content production and service platform that leverages video understanding to automate editing and generation. By dissecting existing long‑form OGC/PGC videos, the platform can extract highlights, generate appealing covers, and re‑assemble fragments to serve different audience groups.
The platform’s workflow is straightforward: it collects video, audio, image, text, bullet‑comment, and user‑behavior data; performs holistic analysis; determines optimal clipping points and rhythms; then automatically processes the content in the backend before distributing it to consumption platforms.
Key technologies highlighted include:
Scene detection : Identifies coherent narrative segments to ensure extracted clips retain story completeness, using traditional visual detection and AI‑based clustering.
Video and image processing : Enhances clips with beautification, style transfer, and other effects, achieving up to a ten‑fold efficiency improvement.
Bullet‑comment analysis : Extracts user‑generated commentary to pinpoint moments of high interest, enabling personalized highlight generation.
Content evaluation : Moves beyond objective metrics (e.g., PSNR) toward subjective, user‑centric quality assessment, aiming for an objective standard that reflects viewer preferences.
The system’s architecture integrates these capabilities with foundational technologies such as image, video, and audio processing frameworks. By combining multi‑dimensional analysis (scene, object, character, sentiment) with automated production, the platform reduces manual labor costs dramatically—machines operate 24/7 and achieve several times higher efficiency than human editors.
From a product perspective, the platform provides services like intelligent clipping, automatic cover generation, and content recommendation, which have already been integrated into Youku’s latest app version, including smart subtitles and open APIs for external partners.
Finally, Zhang Lei invited attendees to join the technical discussion group to continue exploring intelligent content production and analysis.
Youku Technology
Discover top-tier entertainment technology here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.