Data Party THU
Data Party THU
Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEGenerative AIcomputer vision
0 likes · 11 min read
How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer
AI Frontier Lectures
AI Frontier Lectures
Jul 8, 2025 · Artificial Intelligence

How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer

The LaVin-DiT paper presents a large vision diffusion transformer that integrates a spatio‑temporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient multi‑task generation for images and videos, and details its training via flow‑matching and experimental results.

3D RoPEJoint Diffusion TransformerST-VAE
0 likes · 12 min read
How LaVin-DiT Unifies Vision Tasks with a Large Diffusion Transformer