Open-Source EchoMimic Lets Photos Speak – Stunning Results from Alibaba
EchoMimic is an open‑source AI tool that animates portrait photos into speaking videos using audio or facial landmarks, built on Stable Diffusion with a specialized Denoising U‑Net architecture, and comes with step‑by‑step setup instructions and example demos.
Imagine a museum portrait that narrates its own history or a family photo that revives forgotten memories by speaking; EchoMimic aims to make that vision a reality.
EchoMimic is an open‑source project released by Ant Group that generates animated portrait videos from either audio signals, facial landmark trajectories, or a combination of both. The system is built on the Stable Diffusion (SD) framework and leverages a Latent Diffusion Model (LDM) together with a Variational Autoencoder (VAE). It injects Gaussian noise into the latent representation and then denoises it to synthesize images.
The core of EchoMimic is a Denoising U‑Net architecture comprising three dedicated encoders—Reference U‑Net, Landmark Encoder, and Audio Encoder—plus a Temporal Attention Layer that enforces temporal consistency across video frames.
Official demonstration videos include:
Audio‑driven singing
Audio‑driven English speech
Audio‑driven Chinese speech
Quick‑start instructions:
Clone the repository<br/>
git clone https://github.com/BadToBest/EchoMimic
cd EchoMimicCreate a Conda virtual environment<br/>
conda create -n echomimic python=3.8
conda activate echomimicInstall dependencies<br/> pip install -r requirements.txt Download pretrained weights<br/>
git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weightsSet the path to a static FFmpeg binary<br/>
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-staticRun inference<br/>
python -u infer_audio2vid.py
python -u infer_audio2vid_pose.pyFor more details and the full source code, visit the GitHub repository:
https://github.com/BadToBest/EchoMimic
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Cultivation Path
Focused on sharing practical tech content about TypeScript, Vue 3, front-end architecture, and source code analysis.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
