Artificial Intelligence 4 min read

Open-Source EchoMimic Lets Photos Speak – Stunning Results from Alibaba

EchoMimic is an open‑source AI tool that animates portrait photos into speaking videos using audio or facial landmarks, built on Stable Diffusion with a specialized Denoising U‑Net architecture, and comes with step‑by‑step setup instructions and example demos.

Full-Stack Cultivation Path

Jul 19, 2024

Open-Source EchoMimic Lets Photos Speak – Stunning Results from Alibaba

Imagine a museum portrait that narrates its own history or a family photo that revives forgotten memories by speaking; EchoMimic aims to make that vision a reality.

EchoMimic is an open‑source project released by Ant Group that generates animated portrait videos from either audio signals, facial landmark trajectories, or a combination of both. The system is built on the Stable Diffusion (SD) framework and leverages a Latent Diffusion Model (LDM) together with a Variational Autoencoder (VAE). It injects Gaussian noise into the latent representation and then denoises it to synthesize images.

The core of EchoMimic is a Denoising U‑Net architecture comprising three dedicated encoders—Reference U‑Net, Landmark Encoder, and Audio Encoder—plus a Temporal Attention Layer that enforces temporal consistency across video frames.

Official demonstration videos include:

Audio‑driven singing

Audio‑driven English speech

Audio‑driven Chinese speech

Quick‑start instructions:

Clone the repository

git clone https://github.com/BadToBest/EchoMimic
cd EchoMimic

Create a Conda virtual environment

conda create -n echomimic python=3.8
conda activate echomimic

Install dependencies pip install -r requirements.txt Download pretrained weights

git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

Set the path to a static FFmpeg binary

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Run inference

python -u infer_audio2vid.py
python -u infer_audio2vid_pose.py

For more details and the full source code, visit the GitHub repository:

https://github.com/BadToBest/EchoMimic

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Stable Diffusion open-source AI digital human EchoMimic audio-driven animation portrait video

Written by

Full-Stack Cultivation Path

Focused on sharing practical tech content about TypeScript, Vue 3, front-end architecture, and source code analysis.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.