Artificial Intelligence 3 min read

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

AI Engineering

Jan 8, 2026

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

Problem

Most video generation models produce visual content only, while most audio generation models produce sound only, leaving a gap for synchronized audio‑video synthesis.

LTX‑2 Overview

LTX‑2 is an open‑source multimodal diffusion model that learns a joint distribution over audio and video, enabling a single forward pass to generate speech, ambient sounds, actions, and temporal dynamics together.

Architecture

Asymmetric dual‑stream diffusion transformer.

Video stream: 14 billion parameters (high capacity).

Audio stream: 5 billion parameters (lightweight).

Bidirectional audiovisual cross‑attention links the two streams, eliminating redundant computation.

Deep multilingual text encoder processes input prompts.

Introduces a “thinking token” that improves semantic stability and phonetic accuracy of generated speech.

Performance

Processes 49.18 diffusion steps per minute, versus WAN 2.2 14B’s 2.69 steps per minute.

Generates approximately 20 seconds of synchronized high‑resolution, high‑frame‑rate audio‑video per inference.

Qualitative Capability

Joint training lets the model align sound and image intrinsically, e.g., hand motion synchronized with clapping sounds or lip movements matching spoken words.

Resources

Code and model weights: https://github.com/Lightricks/LTX-2

open-source AI multimodal generation video synthesis cross-modal attention LTX-2 audio-visual diffusion

Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.