JD Retail Technology
JD Retail Technology
Jul 1, 2025 · Artificial Intelligence

JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained

JoyGen introduces a two‑stage framework that generates high‑quality talking‑face videos by synchronizing lip movements with input audio using 3DMM‑based identity and expression coefficients, depth‑aware supervision, and a newly built high‑resolution Chinese speaking‑face dataset, achieving state‑of‑the‑art performance on multiple benchmarks.

3DMMAIGCaudio-driven video
0 likes · 13 min read
JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained