Hierarchical Masked 3D Diffusion Model for Video Outpainting
The Hierarchical Masked 3D Diffusion Model (M3DDM) introduces a masking‑based training strategy and cross‑attention with global video clips to achieve temporally consistent video outpainting, while a hybrid coarse‑to‑fine inference pipeline mitigates error accumulation, delivering state‑of‑the‑art results and deployment in Alibaba’s creative center.
This paper introduces a novel video outpainting method based on diffusion models called Hierarchical Masked 3D Diffusion Model (M3DDM). Video outpainting extends video boundaries while maintaining temporal consistency, which is more challenging than image outpainting. The proposed M3DDM uses a masking-based training strategy and incorporates global video clips into cross-attention layers to ensure temporal consistency across multiple video segments through guided frame techniques and reduce inter-frame jitter. Additionally, a hybrid coarse-to-fine inference pipeline is proposed to address error accumulation in long video outpainting. The method achieves state-of-the-art results on video outpainting tasks.
The algorithm has been deployed in Alibaba's creative center and the related paper has been published in ACM MM2023. The code is now open-sourced.
Paper Title: Hierarchical Masked 3D Diffusion Model for Video Outpainting
Paper Download: https://arxiv.org/abs/2309.02119
Project Page: https://fanfanda.github.io/M3DDM/
Code Repository: https://github.com/alimama-creative/M3DDM-Video-Outpainting
The method addresses two main challenges in video outpainting: ensuring temporal consistency across video segments and mitigating error accumulation in long videos. The solution involves building a 3D video diffusion model based on Stable Diffusion's parameter prior, using guided frames to connect video segments, incorporating global frames as prompts in cross-attention layers, and proposing a hybrid coarse-to-fine inference pipeline.
Experimental results show significant improvements over existing methods like Dehan and Simple Diffusion Model on DAVIS and YouTube-VOS datasets. The algorithm is currently deployed in Alibaba's creative center for advertisers to modify video sizes for various ad placements.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
