How JoyGen Achieves High‑Quality Audio‑Driven 3D Talking‑Face Video Editing

JoyGen introduces a two‑stage framework that combines 3D morphable model reconstruction with audio‑driven lip motion generation and depth‑aware visual synthesis, delivering precise audio‑lip synchronization and superior visual quality on both the HDTF benchmark and a newly built high‑resolution Chinese talking‑face dataset.

3DMMAIGCDeep Learning

0 likes · 12 min read

How JoyGen Achieves High‑Quality Audio‑Driven 3D Talking‑Face Video Editing

JD Retail Technology

Jul 1, 2025 · Artificial Intelligence

JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained

JoyGen introduces a two‑stage framework that generates high‑quality talking‑face videos by synchronizing lip movements with input audio using 3DMM‑based identity and expression coefficients, depth‑aware supervision, and a newly built high‑resolution Chinese speaking‑face dataset, achieving state‑of‑the‑art performance on multiple benchmarks.

3DMMAIGCDeep Learning

0 likes · 13 min read

JoyGen: Audio‑Driven 3D Depth‑Aware Talking‑Face Video Editing Explained