Animate Any Image with First Order Motion Model: A Step‑by‑Step Guide

This tutorial explains the First Order Motion Model for animating static images, covering the algorithm's keypoint‑based motion estimation, required datasets, environment setup with Python, OpenCV and ffmpeg, and provides complete code snippets to generate animated videos with audio.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Animate Any Image with First Order Motion Model: A Step‑by‑Step Guide

Introduction

Animating a static picture can be achieved using the First Order Motion Model, a deep‑learning technique originally presented at NeurIPS 2019. The model can make any image move, such as making a character from "Game of Thrones" speak like a politician or making a horse run.

Algorithm Principles

The First Order Motion Model uses a set of self‑learned keypoints and local affine transformations to build a complex motion model. It consists of two main modules:

Motion estimation module – separates appearance and motion information via self‑supervised learning and creates feature representations.

Image generation module – models occlusions during motion and combines extracted appearance with the motion features to synthesize the final image.

The model was trained and tested on four datasets: VoxCeleb, UvA‑Nemo, the BAIR robot‑pushing dataset, and a custom‑collected dataset. VoxCeleb contains around 100 k audio clips from 1 251 celebrities, balanced by gender and covering diverse accents, professions, and ages.

Environment Setup

Install the required third‑party libraries using the provided requirements.txt file: python -m pip install -r requirements.txt Configure ffmpeg (download from ffmpeg official site ) and add it to your system PATH.

Implementation

The project Real Time Image Animation uses the First Order Motion Model to animate a static image based on a driving video. Below are the essential code snippets.

Utility Functions

import subprocess
import os
from PIL import Image

def video2mp3(file_name):
    """Convert a video file to an MP3 audio file."""
    outfile_name = file_name.split('.')[0] + '.mp3'
    cmd = f'ffmpeg -i {file_name} -f mp3 {outfile_name}'
    subprocess.call(cmd, shell=True)

def video_add_mp3(file_name, mp3_file):
    """Add an MP3 audio track to a video file."""
    outfile_name = file_name.split('.')[0] + '-f.mp4'
    subprocess.call(f'ffmpeg -i {file_name} -i {mp3_file} -strict -2 -f mp4 {outfile_name}', shell=True)

Main Script

import imageio, torch, cv2, numpy as np
from tqdm import tqdm
from animate import normalize_kp
from demo import load_checkpoints
from skimage import img_as_ubyte
from skimage.transform import resize
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input_image", required=True, help="Path to image to animate")
ap.add_argument("-c", "--checkpoint", required=True, help="Path to checkpoint")
ap.add_argument("-v", "--input_video", required=False, help="Path to video input")
args = vars(ap.parse_args())

source_path = args['input_image']
checkpoint_path = args['checkpoint']
video_path = args.get('input_video')

source_image = imageio.imread(source_path)
source_image = resize(source_image, (256, 256))[..., :3]

generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', checkpoint_path=checkpoint_path)

if not os.path.exists('output'):
    os.mkdir('output')

cap = cv2.VideoCapture(video_path if video_path else 0)
fps = cap.get(cv2.CAP_PROP_FPS)
size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = cv2.VideoWriter_fourcc('M','P','E','G')
out1 = cv2.VideoWriter('output/test.mp4', fourcc, fps, size, True)

cv2_source = cv2.cvtColor(source_image.astype('float32'), cv2.COLOR_BGR2RGB)
source = torch.tensor(source_image[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2).cuda()
kp_source = kp_detector(source)

count = 0
while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame = cv2.flip(frame, 1)
    frame_resized = resize(frame, (256, 256))[..., :3]
    if count == 0:
        kp_driving_initial = kp_detector(torch.tensor(frame_resized[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2).cuda())
    driving_frame = torch.tensor(frame_resized[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2).cuda()
    kp_driving = kp_detector(driving_frame)
    kp_norm = normalize_kp(kp_source=kp_source, kp_driving=kp_driving, kp_driving_initial=kp_driving_initial,
                            use_relative_movement=True, use_relative_jacobian=True, adapt_movement_scale=True)
    out = generator(source, kp_source=kp_source, kp_driving=kp_norm)
    pred = np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])[0]
    pred_bgr = cv2.cvtColor(pred, cv2.COLOR_RGB2BGR)
    out1.write(img_as_ubyte(pred_bgr))
    count += 1

cap.release()
out1.release()
cv2.destroyAllWindows()

if video_path:
    video2mp3(video_path)
    video_add_mp3('output/test.mp4', video_path.split('.')[0] + '.mp3')

Running the Demo

Download the pretrained weights, video and image assets (a packaged zip is provided) and execute:

python image_animation.py -i path_to_input_file -c path_to_checkpoint -v path_to_video_file

For a quick test you can run:

python image_animation.py -i Inputs/trump2.png -c checkpoints/vox-cpk.pth.tar -v 1.mp4

The resulting animated video will be saved in the output directory.

Conclusion

The First Order Motion Model enables fast, GPU‑accelerated animation of static images, and with the provided scripts you can combine the generated video with audio using ffmpeg.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonimage animationFirst Order Motion
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.