Jul 1, 2025 · Artificial Intelligence

JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained

JoyGen introduces a two‑stage framework that generates high‑quality talking‑face videos by synchronizing lip movements with input audio using 3DMM‑based identity and expression coefficients, depth‑aware supervision, and a newly built high‑resolution Chinese speaking‑face dataset, achieving state‑of‑the‑art performance on multiple benchmarks.

3DMMAIGCDeep Learning

0 likes · 13 min read

JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained