Data Party THU
Jul 31, 2025 · Artificial Intelligence
How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer
The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.
3D RoPEGenerative AIcomputer vision
0 likes · 11 min read
