AI Frontier Lectures
AI Frontier Lectures
Jul 8, 2025 · Artificial Intelligence

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.

3D RoPEJoint Diffusion TransformerST-VAE
0 likes · 12 min read
How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer