Amap Tech
May 8, 2025 · Artificial Intelligence
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
FantasyTalking generates high-fidelity, coherent talking portraits from a single static image by employing a two-stage audio-visual alignment—global segment-level motion and frame-level lip refinement—combined with face-centric cross-attention for identity preservation and a motion-intensity module that lets users control expression and body movement, achieving superior realism, synchronization, and performance over prior methods.
audio-visual alignmentdeep learningidentity preservation
0 likes · 10 min read