Artificial Intelligence 14 min read

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

LivePortrait is an open‑source, controllable portrait video generation framework that transfers facial expressions and poses from a driving video to static or dynamic portraits in real time, leveraging a 69M‑frame mixed video‑image training set, stitching and retargeting modules, and achieving high quality with low latency.

Kuaishou Tech

Jul 16, 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Overview

LivePortrait, released by Kuaishou’s large‑model team, is a controllable portrait video generation framework that can accurately and in real time transfer the expression and pose of a driving video onto static or dynamic portrait videos, producing expressive results.

Paper

The corresponding paper is titled "LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control" and the code, paper, and demo are all publicly available.

Method Introduction

Unlike mainstream diffusion‑based methods, LivePortrait expands an implicit‑keypoint framework to balance computational efficiency and controllability. It uses 69 M high‑quality training frames with a video‑image mixed training strategy, upgrades the network architecture, and designs stitching and retargeting modules that are lightweight MLPs, keeping computational cost negligible.

Training Stages

1.1 First Stage – Base Model Training

LivePortrait improves upon existing implicit‑keypoint frameworks (e.g., Face vid2vid) by collecting high‑quality data from VoxCeleb, MEAD, RAVDESS, AAHQ, a private 4K‑resolution dataset, and a proprietary KVQ quality‑assessment tool, resulting in 69 M video frames covering 18.9 K identities and 60 K stylized portraits. A video‑image mixed training strategy treats each image as a one‑frame video, enhancing generalization to stylized faces.

The network unifies the implicit‑keypoint estimator, head‑pose estimator, and deformation estimator into a single model based on ConvNeXt‑V2‑Tiny, uses a SPADE decoder with PixelShuffle up‑sampling, and incorporates SPADE for semantic guidance.

Losses include implicit‑keypoint consistency, keypoint prior, head‑pose, deformation prior, perceptual, GAN, and identity losses, summed as the base loss L base .

1.2 Second Stage – Stitching and Retargeting Module Training

The stitching module learns to align reference and driving implicit keypoints across identities, outputting a stitched image and a reconstruction image. Eye‑ and mouth‑redirect modules receive reference keypoints, condition vectors (eye/mouth openness), and random driving coefficients to predict deformation deltas, ensuring accurate eye‑closure and mouth‑shape transfer. Separate losses enforce pixel consistency, regularization, and condition matching.

Experiments and Comparison

LivePortrait is evaluated on same‑identity and cross‑identity driving tasks. Compared with non‑diffusion methods and diffusion‑based approaches (e.g., AniPortrait), it achieves comparable or better visual quality, finer eye and mouth motion capture, and far higher inference speed (≈12.8 ms per frame on RTX 4090, potentially <10 ms with TensorRT).

It also topped HuggingFace Space and Papers with Code leaderboards for a week.

Extensions

LivePortrait supports multi‑person driving (thanks to the stitching module), animal portrait driving after fine‑tuning, and head‑region video editing without affecting the background.

Deployment

The technology has been integrated into several Kuaishou products, including Kuaishou Magic Table, private messaging, Kuaishou Short Video AI‑expression features, live streaming, and the youth‑focused Pujii app. Future work will explore multimodal driving and higher‑quality generation.

References

[1] Ting‑Chun Wang et al., CVPR 2021. [2] Arsha Nagrani et al., Interspeech 2017. [3] Kaisiyuan Wang et al., ECCV 2020. [4] Steven R. Livingstone & Frank A. Russo, PLoS ONE 2018. [5] Mingcong Liu et al., NeurIPS 2021. [6] Haotian Yang et al., SIGGRAPH Asia 2023. [7] Kai Zhao et al., CVPR 2023. [8] Sanghyun Woo et al., CVPR 2023. [9] Taesung Park et al., CVPR 2019. [10] Wenzhe Shi et al., CVPR 2016. [11] Huawei Wei et al., arXiv 2024.

real-time computer vision AI deep learning portrait generation Video Animation

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.