Open-Source EchoMimic Lets Photos Speak – Stunning Results from Alibaba

EchoMimic is an open‑source AI tool that animates portrait photos into speaking videos using audio or facial landmarks, built on Stable Diffusion with a specialized Denoising U‑Net architecture, and comes with step‑by‑step setup instructions and example demos.

Full-Stack Cultivation Path
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Open-Source EchoMimic Lets Photos Speak – Stunning Results from Alibaba

Imagine a museum portrait that narrates its own history or a family photo that revives forgotten memories by speaking; EchoMimic aims to make that vision a reality.

EchoMimic is an open‑source project released by Ant Group that generates animated portrait videos from either audio signals, facial landmark trajectories, or a combination of both. The system is built on the Stable Diffusion (SD) framework and leverages a Latent Diffusion Model (LDM) together with a Variational Autoencoder (VAE). It injects Gaussian noise into the latent representation and then denoises it to synthesize images.

The core of EchoMimic is a Denoising U‑Net architecture comprising three dedicated encoders—Reference U‑Net, Landmark Encoder, and Audio Encoder—plus a Temporal Attention Layer that enforces temporal consistency across video frames.

Official demonstration videos include:

Audio‑driven singing

Audio‑driven English speech

Audio‑driven Chinese speech

Quick‑start instructions:

Clone the repository<br/>

git clone https://github.com/BadToBest/EchoMimic
cd EchoMimic

Create a Conda virtual environment<br/>

conda create -n echomimic python=3.8
conda activate echomimic

Install dependencies<br/> pip install -r requirements.txt Download pretrained weights<br/>

git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

Set the path to a static FFmpeg binary<br/>

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Run inference<br/>

python -u infer_audio2vid.py
python -u infer_audio2vid_pose.py

For more details and the full source code, visit the GitHub repository:

https://github.com/BadToBest/EchoMimic
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Stable Diffusionopen-source AIdigital humanEchoMimicaudio-driven animationportrait video
Full-Stack Cultivation Path
Written by

Full-Stack Cultivation Path

Focused on sharing practical tech content about TypeScript, Vue 3, front-end architecture, and source code analysis.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.