Tagged articles
14 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 23, 2026 · Artificial Intelligence

ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026

ControlAudio, a progressive diffusion framework introduced by Tsinghua researchers, unifies text, timing, and phoneme modeling to enable precise control over when sounds occur and what is spoken, achieving superior alignment and intelligibility while preserving high‑fidelity audio generation.

ACL 2026Audio SynthesisControlAudio
0 likes · 11 min read
ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026
AI Engineering
AI Engineering
Jan 8, 2026 · Artificial Intelligence

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

LTX-2audio-visual diffusioncross-modal attention
0 likes · 3 min read
LTX-2 Open‑Source: The First Model That Generates Video and Audio Together
Kuaishou Tech
Kuaishou Tech
Sep 17, 2025 · Artificial Intelligence

How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation

The MIDAS framework introduced by the Kling Team combines autoregressive video generation with a lightweight diffusion denoising head to deliver real‑time, high‑quality digital‑human synthesis under multimodal control, achieving sub‑500 ms latency, 64× compression, and robust performance across multilingual dialogue, singing, and interactive world modeling tasks.

AIDigital HumanReal-time Video
0 likes · 6 min read
How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation
Tencent Tech
Tencent Tech
Aug 21, 2025 · Artificial Intelligence

Yan: Tencent’s Real‑Time High‑Fidelity Interactive Video Generation

Tencent’s newly released Yan system advances interactive video generation by delivering high‑fidelity, real‑time, editable content for games, virtual worlds and AIGC, featuring a three‑module architecture—Yan‑Sim for AAA‑level simulation, Yan‑Gen for multimodal generation, and Yan‑Edit for granular editing—while also introducing a large‑scale high‑quality dataset and efficient inference optimizations.

Interactive VideoReal-time SimulationVideo Editing
0 likes · 12 min read
Yan: Tencent’s Real‑Time High‑Fidelity Interactive Video Generation
AntTech
AntTech
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

EchoMimicV2, an open‑source project from Ant Group's Alipay AI team, introduces an end‑to‑end audio‑driven framework that generates high‑quality semi‑body portrait videos by jointly coordinating audio, pose, and image inputs, while addressing challenges of condition complexity, model stability, and computational cost.

Digital Humanaudio-driven animationdiffusion models
0 likes · 16 min read
EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework
Baidu Tech Salon
Baidu Tech Salon
Nov 14, 2024 · Artificial Intelligence

How Baidu’s Wenxin Model Hit 430 Million Users and What Its New Tech Means for AI

At Baidu World 2024, CTO Wang Haifeng revealed that Wenxin Yiyan has reached 430 million users, detailed the model’s retrieval‑augmented and multimodal generation breakthroughs, showcased intelligent‑agent‑driven coding tools, and highlighted expanding AI applications across education, sports, and industry.

AIIntelligent agentsindustry applications
0 likes · 7 min read
How Baidu’s Wenxin Model Hit 430 Million Users and What Its New Tech Means for AI
Tencent Cloud Developer
Tencent Cloud Developer
May 15, 2024 · Artificial Intelligence

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

Tencent has open‑sourced its upgraded 1.5‑billion‑parameter HunYuan DiT model—the first Chinese‑native, bilingual (Chinese‑English) text‑to‑image diffusion‑with‑transformer system—delivering about 20% visual quality improvement, multi‑round generation, video‑generation potential, and free commercial use, with full weights, inference code, and algorithms available on Hugging Face and GitHub for developers and enterprises.

Chinese-native AIDiT architectureDiffusion Transformer
0 likes · 6 min read
Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters
DataFunTalk
DataFunTalk
Jan 31, 2024 · Artificial Intelligence

Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)

The article reviews the rapid development of large language models in enterprise settings, covering internal collaboration tools, AI assistants for development and marketing, multimodal generation, inference speed bottlenecks, resource constraints, and future directions such as open‑source models and academic‑industry cooperation.

AI assistantsAI in marketingEnterprise AI
0 likes · 8 min read
Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)
Ximalaya Technology Team
Ximalaya Technology Team
Oct 10, 2023 · Artificial Intelligence

MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis

MiniGPT-5 is a novel multimodal generation model using generative vokens to interleave text and image synthesis, integrating Stable Diffusion and LLMs with a two-stage training that requires no domain-specific annotations, achieving state‑of‑the‑art coherence and quality on benchmarks like CC3M, VIST, and MMDialog.

AI researchStable DiffusionVision Transformer
0 likes · 9 min read
MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis
Alimama Tech
Alimama Tech
Aug 2, 2023 · Artificial Intelligence

Can AI Fully Automate Advertising Poster Creation and Video Outpainting?

This article reviews four ACM MM 2023 papers that introduce AI‑driven systems for automatic advertising poster generation, multimodal text‑image creation, few‑shot style‑guided visual captioning, and hierarchical 3D diffusion models for video outpainting, detailing their methods, datasets, and experimental results.

AI-generated designPoster Automationdiffusion models
0 likes · 9 min read
Can AI Fully Automate Advertising Poster Creation and Video Outpainting?