Tagged articles

audio generation

13 articles · Page 1 of 1

May 11, 2026 · Artificial Intelligence

Why Enterprises Are Switching from Suno to the Homegrown AI Music Platform Mureka

Enterprises are moving away from Suno to Mureka because the newer models deliver higher vocal realism, faster generation, better stability, and direct integration support, as shown by case studies from Sondo, KuaiGe, and a leading overseas MV platform that saw multi‑fold growth.

AI musicMurekaSuno

0 likes · 10 min read

Why Enterprises Are Switching from Suno to the Homegrown AI Music Platform Mureka

Machine Heart

May 8, 2026 · Artificial Intelligence

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Omni2Sound tackles the long‑standing “generalist” dilemma of unified audio generation by constructing a high‑quality V‑T‑A dataset (SoundAtlas), employing a three‑stage progressive training pipeline, and using a simple Diffusion Transformer backbone, ultimately achieving state‑of‑the‑art performance on T2A, V2A and VT2A tasks and strong robustness on off‑screen scenarios.

Data AlignmentDiffusion ModelsMultimodal Learning

0 likes · 16 min read

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Machine Heart

Apr 21, 2026 · Artificial Intelligence

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

ControlAudio, a progressive diffusion model presented at ACL 2026, jointly models text, timing, and phoneme information to achieve precise event timing and intelligible speech in text-to-audio generation, backed by a large mixed real‑synthetic dataset and competitive experimental results.

ControlAudioMultimodal LearningProgressive Diffusion

0 likes · 10 min read

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

Meituan Technology Team

Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchOpen-sourceText‑to‑Speech

0 likes · 12 min read

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

Xiaomi Tech

Feb 3, 2026 · Artificial Intelligence

Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings

The International Conference on Learning Representations (ICLR) 2026 accepted multiple Xiaomi papers covering multimodal reasoning, reinforcement learning, GUI agents, autonomous driving, audio generation and benchmark design, each presenting novel frameworks, data‑centric training tricks and strong experimental results that advance the state of the art.

BenchmarkICLR 2026Multimodal Learning

0 likes · 17 min read

Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings

Alimama Tech

Dec 17, 2025 · Artificial Intelligence

How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation

The VeM model introduces a latent diffusion framework that leverages hierarchical video parsing, scene‑guided cross‑attention, and a transition‑beat alignment adapter to generate high‑fidelity background music perfectly synchronized with video semantics, timing, and rhythm, outperforming existing baselines on extensive quantitative and qualitative evaluations.

Cross-AttentionTemporal Alignmentaudio generation

0 likes · 14 min read

How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation

ShiZhen AI

May 13, 2025 · Artificial Intelligence

Top Free Text‑to‑Speech Tools for Content Creators

This article reviews five free text‑to‑speech solutions—AI易视频, Google TTS, Natural Reader, Balabolka, and Speech2Go—detailing their features, language support, installation needs, and unique capabilities to help creators choose the right tool for narration, translation, or multi‑character audio production.

AITTSText‑to‑Speech

0 likes · 7 min read

Top Free Text‑to‑Speech Tools for Content Creators

Smart Era Software Development

Feb 20, 2025 · Industry Insights

Which Creative AI Tools Are Shaping Multimodal Generative Content in 2024?

The article reviews the most notable open‑source and commercial creative AI tools released in 2024 across image, video, and audio generation, explains key technical shifts such as diffusion Transformers and zero‑shot personalization, and forecasts major trends and new releases expected in 2025.

AI artMultimodal AIaudio generation

0 likes · 14 min read

Which Creative AI Tools Are Shaping Multimodal Generative Content in 2024?

Fighter's World

Sep 30, 2024 · Artificial Intelligence

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

The author reviews Google NotebookLM, describing how it aids deep paper reading, boosts chat willingness with guided prompts, maintains conversation coherence through self‑play insights, highlights the audio‑overview feature, and reflects on AI concepts such as the "bitter lesson" and the limits of self‑play in open scenarios.

AI researchGoogleLLM

0 likes · 22 min read

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

Baidu MEUX

Jul 24, 2024 · Artificial Intelligence

What’s New in AI? Video QA, Audio Generation, and Major Industry Moves

This roundup highlights the latest AI breakthroughs, including Zhipu AI's video‑understanding model for temporal Q&A, Tencent's video‑to‑audio generation system, Vimeo's AI‑content labeling policy, Apple’s Core ML inclusion of ByteDance’s depth model, AMD’s acquisition of Silo AI, Claude’s new editing features, Quark’s all‑in‑one search AI, TikTok’s VR live streaming on Vision Pro, the launch of the "Xinliu" AI search assistant, and Canva’s restrictions on political AI‑generated posters.

AI modelsArtificial Intelligenceaudio generation

0 likes · 8 min read

What’s New in AI? Video QA, Audio Generation, and Major Industry Moves

Rare Earth Juejin Tech Community

Aug 30, 2023 · Artificial Intelligence

AudioCraft: An Open‑Source PyTorch Library for Audio Generation with MusicGen, AudioGen, and EnCodec

AudioCraft is a PyTorch library that bundles state‑of‑the‑art AI models—MusicGen, AudioGen, and the EnCodec codec—to generate high‑quality audio from text or reference sounds, and the article explains its architecture, evaluation results, and how to install and run it.

AI modelsAudioGenEnCodec

0 likes · 9 min read

AudioCraft: An Open‑Source PyTorch Library for Audio Generation with MusicGen, AudioGen, and EnCodec

Volcano Engine Developer Services

Feb 14, 2023 · Artificial Intelligence

How Make-An-Audio Turns Text Into Realistic Sound Effects

Make-An-Audio, a collaborative text‑to‑audio model from Zhejiang University, Peking University and Volcano Speech, uses a Distill‑then‑Reprogram strategy to generate high‑quality, controllable sound effects from any modality, showcasing impressive demos and promising future AIGC applications.

AIGCDeep LearningSpeech synthesis

0 likes · 7 min read

How Make-An-Audio Turns Text Into Realistic Sound Effects

The Dominant Programmer

Nov 27, 2020 · Backend Development

Using Jacob in Java for Windows Speech Synthesis and Audio File Generation

This guide walks through downloading Jacob's DLL and JAR, configuring the Java environment, setting up an Eclipse project, and writing Java code that leverages the SAPI COM interfaces to synthesize Chinese text into a WAV file on Windows, complete with step‑by‑step screenshots and a full source example.

COMJacobJava

0 likes · 5 min read

Using Jacob in Java for Windows Speech Synthesis and Audio File Generation