AI Frontier Lectures
Jul 8, 2025 · Artificial Intelligence
How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer
The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.
3D RoPEJoint Diffusion TransformerST-VAE
0 likes · 12 min read
