Tagged articles
2 articles
Page 1 of 1
Alimama Tech
Alimama Tech
Dec 17, 2025 · Artificial Intelligence

How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation

The VeM model introduces a latent diffusion framework that leverages hierarchical video parsing, scene‑guided cross‑attention, and a transition‑beat alignment adapter to generate high‑fidelity background music perfectly synchronized with video semantics, timing, and rhythm, outperforming existing baselines on extensive quantitative and qualitative evaluations.

Cross-AttentionLatent Diffusionaudio generation
0 likes · 14 min read
How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 20, 2020 · Artificial Intelligence

Video Copyright Detection Algorithm: Competition Solution Overview

The Hulu Brothers’ competition solution tackles large‑scale video copyright detection by extracting I‑frames, encoding them with ResNet‑18 CNN features, performing approximate nearest‑neighbor search and ORB re‑ranking to match queries to reference videos, then linearly interpolating frame correspondences for precise temporal alignment, achieving high precision, recall and F1 scores.

CNN featuresCompetition Solutionframe extraction
0 likes · 15 min read
Video Copyright Detection Algorithm: Competition Solution Overview