Jan 8, 2026 · Artificial Intelligence

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

LTX-2audio-visual diffusioncross-modal attention

0 likes · 3 min read

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together