LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

AI Engineering
AI Engineering
AI Engineering
LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

Problem

Most video generation models produce visual content only, while most audio generation models produce sound only, leaving a gap for synchronized audio‑video synthesis.

LTX‑2 Overview

LTX‑2 is an open‑source multimodal diffusion model that learns a joint distribution over audio and video, enabling a single forward pass to generate speech, ambient sounds, actions, and temporal dynamics together.

Architecture

Asymmetric dual‑stream diffusion transformer.

Video stream: 14 billion parameters (high capacity).

Audio stream: 5 billion parameters (lightweight).

Bidirectional audiovisual cross‑attention links the two streams, eliminating redundant computation.

Deep multilingual text encoder processes input prompts.

Introduces a “thinking token” that improves semantic stability and phonetic accuracy of generated speech.

Performance

Processes 49.18 diffusion steps per minute, versus WAN 2.2 14B’s 2.69 steps per minute.

Generates approximately 20 seconds of synchronized high‑resolution, high‑frame‑rate audio‑video per inference.

Qualitative Capability

Joint training lets the model align sound and image intrinsically, e.g., hand motion synchronized with clapping sounds or lip movements matching spoken words.

Resources

Code and model weights: https://github.com/Lightricks/LTX-2

open-source AImultimodal generationvideo synthesiscross-modal attentionLTX-2audio-visual diffusion
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.