Artificial Intelligence 14 min read

AI-Driven Video Quality Evaluation and Machine Filtering for Short Videos at Youku

Youku’s MoKu Lab combines rule‑based preprocessing with multimodal AI models that score titles, covers and video content, enabling large‑scale machine filtering that boosts human review efficiency by 5‑6 percentage points and reduces low‑quality short videos from roughly 15 % to under 3 % while supporting flexible business rules.

Youku Technology
Youku Technology
Youku Technology
AI-Driven Video Quality Evaluation and Machine Filtering for Short Videos at Youku

Short‑video platforms generate massive amounts of new content daily. After basic preprocessing, hundreds of thousands of short videos still enter the quality‑review pipeline, while manual review capacity is limited. The challenge is to balance human cost with efficient quality assessment to surface truly "good" videos to end users.

The article, authored by Alibaba Entertainment technical expert Dong Huoming and contributors, presents the practice and research from the MoKu Lab on using AI for large‑scale video filtering and low‑quality video detection.

1. Background

Short‑video feeds have exploded in popularity, reaching 600 million active users. The quality of user‑generated (UGC) and professionally generated (PGC) videos varies widely, making intelligent quality assessment essential for production efficiency, storage quality, and user experience.

2. Machine Filtering Overview

The video audit chain can be simplified into three main stages: red‑line compliance checks, machine filtering, and manual review. Machine filtering includes rule‑based preprocessing (duration, resolution, aspect ratio, etc.) and AI‑driven quality evaluation of titles, covers, and content.

3. AI‑Based Video Quality Evaluation

Quality assessment is divided into subjective (human) and objective (algorithmic) methods. Objective methods are categorized as full‑reference, reduced‑reference, and no‑reference. The Youku Intelligent Video Analysis Platform builds a multi‑dimensional quality evaluation framework covering title, cover, and content.

3.1 Intelligent Cover Selection

Cover quality is critical. Dozens of objective metrics (e.g., number of people, image clarity, logo presence, QR code, black borders) are used to train multimodal models that output a comprehensive cover quality score. Specialized detectors for logos, QR codes, black borders, and masked images are also deployed.

3.2 High‑Quality Title Filtering

Title quality is evaluated with a deep‑learning text‑classification model and a rule‑based feature detector (sensitive words, social info, typos, length, click‑bait patterns). The combined system produces a title quality score and flags specific issues.

3.3 Content Quality Analysis

Content quality requires multimodal analysis of audio, visual, and textual cues. Separate models address video clarity, logo detection, abnormal frames (black screen), audio‑video sync, and other low‑quality signals. Frame‑level and clip‑level features (RGB, optical flow, audio embeddings) are fused in deep models to produce a final content quality judgment.

4. System Architecture

The service framework consists of four layers:

Infrastructure layer: compute, data, annotation, and training platforms.

Algorithm capability layer: image, text, and video understanding models plus service APIs.

Business application layer: quality models applied to video submission, content pool building, and product operations.

Data & feedback system: online metric monitoring, bad‑case feedback, and iterative model improvement.

5. Applications and Results

Machine‑filtering models have been deployed in several Youku short‑video scenarios. Initial evaluation showed a 5–6 percentage‑point increase in human review efficiency. After multiple iterations, low‑quality video rate dropped from ~15 % to < 3 % with low false‑positive rates.

The framework also supports "精品化建仓" (high‑quality content pool building) and provides operational tools for flexible rule configuration across different business scenarios.

6. Challenges and Outlook

Future work includes deeper video understanding, automatic classification, safety detection, and fine‑grained machine review. Continual updates to quality and safety standards, as well as variability among human reviewers, remain open challenges.

References to related research (e.g., NIMA, CNN for sentence classification) are listed at the end of the article.

AIdeep learningvideo qualityshort videocontent moderationmachine filtering
Youku Technology
Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.