Tagged articles

multimodal diffusion

8 articles · Page 1 of 1

Jun 15, 2026 · Artificial Intelligence

Baton: Semantic Blueprint Enables Precise Audio‑Video Synchronization in Generation

Current open‑source audio‑video generators struggle with complex, multi‑stage prompts, leading to misaligned actions and sounds; Baton, introduced by Fudan University and Tencent, decouples semantic reasoning from content generation via a shared cross‑modal semantic blueprint and RS‑RoPE, achieving markedly better synchronization and prompt adherence.

BatonRS-RoPEVA-Planner

0 likes · 21 min read

Baton: Semantic Blueprint Enables Precise Audio‑Video Synchronization in Generation

SuanNi

May 28, 2026 · Artificial Intelligence

How a 3.8B Model Beats 6B+ Models Using Just 20% of the Compute – Inside Microsoft Lens

Microsoft’s Lens team shows that a 3.8 B‑parameter image‑generation model can match or surpass 6 B‑plus models while consuming only about 19 % of the GPU compute, thanks to aggressive model compression, dense captioning, mixed‑resolution training, optimized VAE and language encoders, and targeted RL fine‑tuning.

BenchmarkingReinforcement Learningdense captioning

0 likes · 14 min read

How a 3.8B Model Beats 6B+ Models Using Just 20% of the Compute – Inside Microsoft Lens

Machine Heart

Apr 29, 2026 · Artificial Intelligence

Beyond VLA and World Models: Galaxy General Unveils LDA‑1B to Scale Embodied Data

LDA‑1B unifies world modeling and VLA in a latent dynamics action model, ingesting over 30 000 hours of heterogeneous embodied data via a five‑layer AstraData pipeline, employing a unified end‑effector space and quality‑based data allocation, and achieving state‑of‑the‑art success rates on RoboCasa‑GR1 while being fully open‑sourced.

Data Ingestionembodied AIlatent dynamics

0 likes · 13 min read

Beyond VLA and World Models: Galaxy General Unveils LDA‑1B to Scale Embodied Data

SuanNi

Feb 28, 2026 · Artificial Intelligence

How SkyReels V4 Achieves Synchronized Audio‑Video Generation at Film Quality

The article provides an in‑depth technical analysis of SkyReels V4, a multimodal diffusion model that generates ultra‑high‑definition, long‑duration videos with perfectly synchronized sound, detailing its dual‑stream architecture, channel‑concatenation strategy, efficient refinement pipeline, training methodology, and benchmark performance.

AI Video Generationaudio‑video synchronizationbenchmark

0 likes · 13 min read

How SkyReels V4 Achieves Synchronized Audio‑Video Generation at Film Quality

AI Frontier Lectures

Jan 30, 2026 · Artificial Intelligence

Inside MOVA: Open-Source End-to-End Audio-Video Generation

OpenMOSS and MOSI unveiled MOVA, China’s first high‑performance open‑source audio‑video generation model, detailing its dual‑tower architecture, bridge module, aligned ROPE, multi‑stage data pipeline, training strategies, dual CFG guidance, and benchmark results that surpass leading closed‑source systems.

MOVAModel Architectureaudio-video generation

0 likes · 20 min read

Inside MOVA: Open-Source End-to-End Audio-Video Generation

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Qwen-Image: The Best Open‑Source AI Image Generation Model Unveiled

Qwen-Image, an open‑source multimodal diffusion model, introduces a three‑component architecture, dual‑stream encoding, and a novel MSRoPE positional scheme to achieve superior text‑aligned image generation, with extensive benchmark results, detailed data engineering, progressive training strategies, and publicly released weights for easy access.

AI image generationMSRoPEQwen-Image

0 likes · 9 min read

Qwen-Image: The Best Open‑Source AI Image Generation Model Unveiled

AI Algorithm Path

Jun 3, 2025 · Artificial Intelligence

Inside Tencent’s HunyuanVideo-Avatar: How Open‑Source AI Generates Digital Human Videos

Tencent’s HunyuanVideo-Avatar converts a static portrait and an audio clip into a lip‑synced, expressive video using a multimodal diffusion Transformer, offering open‑source weights, detailed module designs, hardware requirements, code examples, and a candid assessment of its strengths and current limitations.

AI Video GenerationCUDAHunyuanVideo-Avatar

0 likes · 8 min read

Baidu Intelligent Cloud Tech Hub

Jul 31, 2023 · Artificial Intelligence

Boosting Large Model Inference: High‑Performance Optimization Techniques

This article explains the background, challenges, and high‑performance optimization methods for deploying large language and multimodal models, covering inference workflow analysis, distributed concurrency, latency reduction, quantization strategies, and service throughput improvements to achieve industry‑leading speed and memory efficiency.

Distributed InferenceQuantizationmultimodal diffusion

0 likes · 12 min read

Boosting Large Model Inference: High‑Performance Optimization Techniques