Tagged articles
3 articles
Page 1 of 1
SuanNi
SuanNi
Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Evaluation MetricsMultimodal AIVideo Generation
0 likes · 15 min read
Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights
Bilibili Tech
Bilibili Tech
Aug 12, 2025 · Artificial Intelligence

How AI Recreates Original Voices in Multilingual Video Dubbing

This article explains the technical challenges and innovative AI solutions behind preserving speaker identity, emotion, and timing while translating video content into multiple languages, covering speech generation modeling, speaker segmentation, adversarial reinforcement learning, proper‑noun adaptation, and audio‑visual alignment techniques.

AI voice cloningDeep LearningSpeech synthesis
0 likes · 22 min read
How AI Recreates Original Voices in Multilingual Video Dubbing
Amap Tech
Amap Tech
May 8, 2025 · Artificial Intelligence

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

FantasyTalking generates high-fidelity, coherent talking portraits from a single static image by employing a two-stage audio-visual alignment—global segment-level motion and frame-level lip refinement—combined with face-centric cross-attention for identity preservation and a motion-intensity module that lets users control expression and body movement, achieving superior realism, synchronization, and performance over prior methods.

Deep Learningaudio-visual alignmentidentity preservation
0 likes · 10 min read
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis