Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Omni2Sound tackles the long‑standing “generalist” dilemma of unified audio generation by constructing a high‑quality V‑T‑A dataset (SoundAtlas), employing a three‑stage progressive training pipeline, and using a simple Diffusion Transformer backbone, ultimately achieving state‑of‑the‑art performance on T2A, V2A and VT2A tasks and strong robustness on off‑screen scenarios.

Audio GenerationData AlignmentDiffusion Models
0 likes · 16 min read
Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment