Tagged articles
26 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 28, 2026 · Artificial Intelligence

Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine

Faced with exploding video volumes and bottlenecks in frame extraction, labeling, and vector storage, MaxFrame offers a three‑step, end‑to‑end distributed pipeline that turns raw videos into searchable semantic vectors while providing zero‑threshold scaling, transparent OSS mounting, row‑level fault tolerance, and elastic concurrency control.

MaxComputeMaxFrameOSS
0 likes · 6 min read
Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AIComputer VisionEdge Computing
0 likes · 20 min read
How Multimodal Large Models Are Revolutionizing Video Analysis
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 7, 2024 · Artificial Intelligence

Multimodal Speaker Diarization for Long-Form Video Scripts

iQIYI’s multimodal speaker diarization system splits long‑form video using subtitle timestamps and scene detection, extracts voiceprints with a custom model, hierarchically clusters them, and applies an Activate Speaker Detection algorithm combined with face‑recognition to assign speakers, achieving around 90 % precision and recall and boosting downstream tasks such as summarization, translation, and dubbing.

Multimodal AIdialogue recognitioniQIYI
0 likes · 8 min read
Multimodal Speaker Diarization for Long-Form Video Scripts
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Mar 12, 2024 · Artificial Intelligence

How Advanced Video AI Transforms Content Moderation and Retrieval

This article explores how modern video AI techniques—ranging from transformer‑based classification to semi‑supervised retrieval and token‑halting acceleration—enable efficient, accurate detection of prohibited content and fast, scalable video search in the era of short‑form media.

AI moderationSemi-supervised LearningTransformer
0 likes · 18 min read
How Advanced Video AI Transforms Content Moderation and Retrieval
IT Xianyu
IT Xianyu
Mar 5, 2024 · Artificial Intelligence

Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction

Researchers from Carnegie Mellon University and the University of Bonn have released the open‑source A‑SOiD platform, which learns and predicts user‑defined behaviors solely from video, offering transparent, bias‑aware AI that can be applied to animal studies, human actions, and diverse pattern‑recognition domains.

AIbehavior recognitionmachine learning
0 likes · 6 min read
Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 19, 2022 · Artificial Intelligence

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

This article explores how artificial intelligence techniques such as deep learning, object detection, multi‑object tracking, and coordinate projection are applied to football video analysis to automatically detect the ball and players, map their positions onto the field, and recognize key events like shots and goals.

AIComputer VisionSports Analytics
0 likes · 16 min read
How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition
Tencent Cloud Developer
Tencent Cloud Developer
Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

Computer VisionMultimodal AIadvertising technology
0 likes · 18 min read
Tencent Advertising Multimedia AI Technology: Research and Application
IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
Feb 14, 2022 · Artificial Intelligence

Multimodal Evolution and Application in Tencent Game Advertising System

This article describes the end‑to‑end multimodal modeling pipeline—covering text, image, and video understanding, model evolution from shallow to deep networks, key‑frame extraction, fine‑tuning, and multimodal fusion—used in Tencent's game ad exchange platform, along with practical deployment challenges and solutions.

AdvertisingCNNMultimodal Learning
0 likes · 22 min read
Multimodal Evolution and Application in Tencent Game Advertising System
ByteDance SE Lab
ByteDance SE Lab
Jul 23, 2021 · Mobile Development

How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR

This article presents a method for precisely measuring mobile app response latency by extracting video frames, detecting start and end frames through image markers and OCR, and calculating the time difference, offering a high‑precision, customizable solution for performance evaluation across diverse app scenarios.

OCRapp latencyframe detection
0 likes · 12 min read
How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR
DataFunTalk
DataFunTalk
Nov 22, 2020 · Artificial Intelligence

Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI-driven short video analysis workflow, covering industry trends, multi‑label video classification, intelligent cover selection, and video generation techniques, while discussing challenges, model building, label expansion, continuous data iteration, and future outlook for video AI in local services.

AIComputer VisionMeituan
0 likes · 16 min read
Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan
DataFunSummit
DataFunSummit
Nov 5, 2020 · Artificial Intelligence

Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI‑driven short‑video analysis pipeline for local‑life scenarios, covering industry trends, multi‑label classification, intelligent cover selection, and video generation, and discusses model construction, label‑system expansion, continuous data iteration, and practical applications in restaurant and hotel domains.

AIMeituanVideo Generation
0 likes · 16 min read
Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan
DataFunTalk
DataFunTalk
Oct 22, 2020 · Artificial Intelligence

Analyzing Video Excitement: Methods, Frameworks, and Applications

This article presents a comprehensive overview of video excitement analysis, covering quality, aesthetics, and narrative factors, describing a multimodal framework with supervised, weakly supervised, and multi‑task models, and illustrating practical applications such as preview generation, clipping, and automatic cover creation.

Multimodal AIWeak Supervisioncontent recommendation
0 likes · 14 min read
Analyzing Video Excitement: Methods, Frameworks, and Applications
DataFunTalk
DataFunTalk
Jul 31, 2020 · Artificial Intelligence

WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation

This article details the technical architecture behind WeChat's 'Kan Kan' content understanding platform, covering text and multimedia analysis, tag extraction, entity recognition, knowledge graph construction, and how these components enhance recommendation recall, ranking, and user engagement across the ecosystem.

Knowledge GraphMultimodal AIRecommendation Systems
0 likes · 46 min read
WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation
Youku Technology
Youku Technology
Jul 29, 2020 · Artificial Intelligence

Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection

The talk explains how Alibaba’s Entertainment Content Operation Platform applies a Partial‑ReID algorithm to overcome the challenges of person re‑identification in heavily edited video content, enabling accurate cross‑shot character matching, richer appearance data, and metrics such as presence, interaction, and storyline for improved video quality assessment.

AIComputer VisionPartial Re-ID
0 likes · 2 min read
Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection
Amap Tech
Amap Tech
Jul 9, 2020 · Artificial Intelligence

AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images

Alibaba Amap’s AMAP‑TECH competition invites participants to develop AI computer‑vision models that classify real‑time road conditions—smooth, slow, or congested—from short sequences of dash‑cam images, using a labeled dataset of 1,500 training sequences and a weighted F1‑score evaluation, with cash prizes up to ¥60,000.

AIComputer VisionDataset
0 likes · 8 min read
AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images
Youku Technology
Youku Technology
Jun 19, 2020 · Artificial Intelligence

Video-based Temporal Event Detection Methods

In the fourth Alibaba Digital Media Technology Night Talk, algorithm engineer Liu Xiaolong presents an overview of video‑based temporal event detection, covering its problem background, representative prior works, and the latest research advances within the MEDIA AI Algorithm Challenge series.

AlibabaComputer VisionDeep Learning
0 likes · 1 min read
Video-based Temporal Event Detection Methods
DataFunTalk
DataFunTalk
Apr 1, 2020 · Artificial Intelligence

Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu

This article outlines Baidu's large‑scale knowledge graph applications in AI, detailing the need for multimodal semantic understanding, challenges in text and video comprehension, and the technical solutions including entity annotation, conceptization, knowledge networks, and multimodal fusion for enhanced search, recommendation, and visual question answering.

Knowledge GraphVisual Question Answeringconceptualization
0 likes · 15 min read
Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu
转转QA
转转QA
Nov 13, 2019 · Frontend Development

Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis

This article describes how the M page’s user‑perceived performance was dramatically improved by applying techniques such as SSR, skeleton screens, image compression, and a video‑frame analysis testing method that delivers millisecond‑level response‑time measurements, enabling sub‑second load times and eliminating white‑screen delays.

SSRSkeleton Screenfrontend
0 likes · 5 min read
Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

Multimodal Retrievalfeature fusioniQIYI challenge
0 likes · 7 min read
Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble
DataFunTalk
DataFunTalk
May 21, 2019 · Artificial Intelligence

Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search

This article presents a comprehensive overview of Alibaba's large entertainment division research on multimodal video analysis, covering intelligent video asset management, automated cover creation with personalized distribution, video knowledge graph construction, multimodal search techniques, and future directions in AI-driven media processing.

AIKnowledge Graphcover generation
0 likes · 17 min read
Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search
Youku Technology
Youku Technology
May 6, 2019 · Artificial Intelligence

Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation

The talk describes Youku’s intelligent production platform, which uses AI and cloud computing to automatically analyze video frames, extract fine‑grained metadata such as scenes, persons, actions and scores, and then generate highlights, vertical clips, annotations and feedback for editors and upstream producers, while addressing challenges like pose‑tracking, graph‑based action classification and future plans for deeper video understanding and open competitions.

AIComputer Visionimage search
0 likes · 14 min read
Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 12, 2019 · Artificial Intelligence

iQIYI Multimodal Technology: Datasets, Applications, and Future Directions

iQIYI leverages multimodal AI—combining audio, visual, and textual cues—to advance video understanding, releasing the world’s largest celebrity dataset (iQIYI‑VID), powering applications such as actor‑focused playback, AI Radar, emoji generation, and rapid automated editing, while pursuing future research in emoji captioning, cross‑modal retrieval, visual question answering, and broader health‑care and education uses.

DatasetsMultimodal AIiQIYI
0 likes · 13 min read
iQIYI Multimodal Technology: Datasets, Applications, and Future Directions
DataFunTalk
DataFunTalk
Dec 16, 2018 · Artificial Intelligence

Practical Applications of Video Content Understanding at Hulu

This article details Hulu's AI-driven techniques for fine-grained video segmentation, end‑cap detection, subtitle detection and language recognition, background‑music classification, automated processing pipelines, tag generation, and cover‑image regeneration, illustrating how these methods improve user experience and operational efficiency.

AI pipelinesCNNcontent understanding
0 likes · 14 min read
Practical Applications of Video Content Understanding at Hulu
21CTO
21CTO
May 8, 2018 · Artificial Intelligence

How Optical Flow Powers 360° Product Views and Advanced Vision Applications

This article explores the evolution and principles of optical flow—from early Horn‑Schunck models and Lucas‑Kanade to modern deep‑learning approaches like FlowNet—detailing its role in JD’s 360° product imaging, video detection, segmentation, view synthesis, and future research challenges in computer vision.

Deep LearningImage Processingoptical flow
0 likes · 15 min read
How Optical Flow Powers 360° Product Views and Advanced Vision Applications
JD Tech
JD Tech
May 4, 2018 · Artificial Intelligence

Optical Flow: Principles, Methods, and Applications in Computer Vision

This article introduces the fundamentals and evolution of optical flow, covering classic algorithms such as Horn‑Schunck and Lucas‑Kanade, modern deep‑learning approaches like FlowNet, and their practical applications in video detection, semantic segmentation, and novel view synthesis.

CNNDeep LearningImage Processing
0 likes · 15 min read
Optical Flow: Principles, Methods, and Applications in Computer Vision
MaGe Linux Operations
MaGe Linux Operations
Jul 1, 2017 · Artificial Intelligence

Detect Looping Video Frames with Perceptual Hashing in Python

This article demonstrates how to use a perceptual average‑hash algorithm in Python to identify duplicate frames in a 24‑hour video, revealing hidden loops and exposing potential video manipulation through systematic frame comparison and analysis.

PythonaHashduplicate frames
0 likes · 9 min read
Detect Looping Video Frames with Perceptual Hashing in Python