Tagged articles
20 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 15, 2026 · Artificial Intelligence

From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis

OmniRoam introduces a panoramic, coarse‑to‑fine framework that generates long, trajectory‑controlled videos with higher spatial consistency and temporal coherence, offering a stable and controllable alternative to short‑clip generation and supporting real‑time preview, high‑resolution refinement, and 3D reconstruction applications.

3D reconstructionOmniRoamgenerative AI
0 likes · 8 min read
From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis
AI Engineering
AI Engineering
Jan 8, 2026 · Artificial Intelligence

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

LTX-2audio-visual diffusioncross-modal attention
0 likes · 3 min read
LTX-2 Open‑Source: The First Model That Generates Video and Audio Together
HyperAI Super Neural
HyperAI Super Neural
Dec 23, 2025 · Artificial Intelligence

NeurIPS 2025‑Selected Multi‑Stream Control Framework Achieves Precise Audio‑Visual Sync via Audio Demixing

The paper introduces a NeurIPS 2025‑selected multi‑stream video generation framework that demixes audio into speech, effects, and music, using dedicated control streams and a multi‑stage training strategy to achieve markedly better lip‑sync, event timing, and overall visual quality than prior methods.

MTV frameworkNeurIPS 2025audio demixing
0 likes · 9 min read
NeurIPS 2025‑Selected Multi‑Stream Control Framework Achieves Precise Audio‑Visual Sync via Audio Demixing
vivo Internet Technology
vivo Internet Technology
Dec 17, 2025 · Frontend Development

Turning 3D Avatars into Video: Puppeteer, H5 Frames & FFmpeg Workflow

This article explains how to overcome performance and integration challenges of 3D avatar rendering across multiple scenarios by exporting avatars as video or GIF resources using a Puppeteer‑driven H5 frame capture pipeline combined with FFmpeg video synthesis, detailing the evaluation of alternatives and the final implementation steps.

H5PuppeteerWeb Automation
0 likes · 13 min read
Turning 3D Avatars into Video: Puppeteer, H5 Frames & FFmpeg Workflow
AI Frontier Lectures
AI Frontier Lectures
Sep 8, 2025 · Artificial Intelligence

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

DynamicFace introduces a novel face‑swapping framework that combines diffusion models with composable 3D facial priors, explicitly decoupling identity, pose, expression, lighting and background, achieving superior identity preservation and motion consistency across images and videos, as demonstrated by extensive qualitative and quantitative comparisons with SOTA methods.

3D facial priorsdiffusion modelface swapping
0 likes · 10 min read
How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video
Bilibili Tech
Bilibili Tech
Sep 4, 2025 · Artificial Intelligence

How AniME Automates Long‑Form Animation with a Director‑Driven Multi‑Agent AI Framework

AniME introduces a director‑driven multi‑agent system that combines a custom model‑selection protocol (MCP) with the open‑source AniSora V3 model to automatically generate consistent, high‑quality long‑form animation from story scripts, handling everything from storyboard creation to video editing and quality evaluation.

Multi-AgentStoryboardanimation
0 likes · 15 min read
How AniME Automates Long‑Form Animation with a Director‑Driven Multi‑Agent AI Framework
AIWalker
AIWalker
Aug 19, 2025 · Artificial Intelligence

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

DynamicFace introduces a diffusion‑based framework that explicitly decouples identity, pose, expression, illumination and background using composable 3D facial priors, achieving superior identity preservation, motion consistency and visual fidelity in both image and video face‑swapping tasks.

3D facial priorsControllable Generationdiffusion models
0 likes · 13 min read
DynamicFace: Controllable High‑Quality Face Swapping for Images and Video
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 18, 2025 · Artificial Intelligence

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

DynamicFace introduces a controllable face‑swapping framework that leverages composable 3D facial priors, dual‑stream identity injection, and a FusionTVO module to achieve superior image and video quality, identity preservation, and temporal consistency, outperforming existing state‑of‑the‑art methods on benchmark datasets.

3D facial priorsAIControllable Generation
0 likes · 13 min read
DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps
AIWalker
AIWalker
Jun 30, 2025 · Artificial Intelligence

Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves

FilMaster is a pioneering AI system that learns cinematic principles from a 440,000‑shot movie database, combines multimodal LLMs, RAG, and audience‑centric rhythm control to generate editable, high‑quality films, and outperforms prior methods by over 50% on the new FilmEval benchmark.

AI film generationFilmEval benchmarkRetrieval Augmented Generation
0 likes · 18 min read
Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves
Bilibili Tech
Bilibili Tech
May 20, 2025 · Artificial Intelligence

How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback

Researchers at Bilibili present Index‑Anisora, an open‑source anime video generation framework that builds a 30k‑sample reward dataset, introduces the multi‑dimensional AnimeReward model and a Gap‑Aware Preference Optimization (GAPO) method, and demonstrate through extensive automatic and human evaluations that their approach significantly outperforms baseline video generators.

AIAlignmentGAPO
0 likes · 20 min read
How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback
Alipay Experience Technology
Alipay Experience Technology
Apr 25, 2025 · Artificial Intelligence

Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic

This article introduces EchoMimic V1 and V2, open‑source generative digital‑human systems that turn a single voice clip and a portrait photo into synchronized talking avatars, covering their technical background, architecture, training strategies, performance comparisons, and potential application scenarios.

digital avatargenerative AImultimodal models
0 likes · 13 min read
Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic
MaGe Linux Operations
MaGe Linux Operations
Mar 28, 2025 · Artificial Intelligence

How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide

This article explains the fundamentals of AI video technology, details the features of Alibaba Cloud's Tongyi Wanxiang platform, demonstrates how to use DeepSeek for script generation, and provides a complete workflow—including code examples—for producing high‑quality AI‑generated videos.

AI video generationDeepSeekJava SDK
0 likes · 24 min read
How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide
DaTaobao Tech
DaTaobao Tech
Mar 3, 2025 · Artificial Intelligence

How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation

Taobao’s AIGC video generation platform, built on a large‑scale “Faxiang” model that evolved from UNet to DiT, leverages over 2 billion curated e‑commerce videos, expert alignment, Lora fine‑tuning, and multi‑control capabilities to deliver diverse, high‑quality product videos that dramatically boost conversion metrics across the marketplace.

AI video generationAIGCe‑commerce
0 likes · 11 min read
How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation
Xiaohe Frontend Team
Xiaohe Frontend Team
Apr 21, 2024 · Artificial Intelligence

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

The article reviews the latest breakthroughs in generative AI, including Microsoft’s VASA‑1 video synthesis model, Meta’s open‑source Llama‑3 large language model, Stability AI’s Stable Diffusion 3 API, Adobe’s integration of third‑party AI video tools into Premiere Pro, and a free image‑style‑recreation platform from Freepik, highlighting their technical details and potential applications.

AI toolsdiffusion modelsgenerative AI
0 likes · 13 min read
What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More
Bilibili Tech
Bilibili Tech
Feb 27, 2024 · Frontend Development

Browser‑Based Video Synthesis Using FFmpeg and WebAssembly

The article details how to compile FFmpeg to WebAssembly and integrate it into a browser‑based video synthesis platform, describing the runtime architecture, JSON‑driven API, key‑frame animation mapping, memory‑limit strategies, text rendering options, and future enhancements such as OPFS, SIMD, and WebGL acceleration.

Web DevelopmentWebAssemblyanimation
0 likes · 28 min read
Browser‑Based Video Synthesis Using FFmpeg and WebAssembly
DevOps
DevOps
Feb 18, 2024 · Artificial Intelligence

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

OpenAI's Sora, the first text‑to‑video model, demonstrates unprecedented video quality and length by leveraging massive high‑quality training data, novel video‑patch representations, diffusion‑based transformer architecture, and precise subtitle generation, reshaping both AI research and media production.

OpenAISoradiffusion model
0 likes · 9 min read
OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations
21CTO
21CTO
Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos

OpenAI’s newly unveiled Sora system can generate 60‑second, high‑quality videos from plain text prompts, leveraging a data‑driven physical engine trained on synthetic data from Unreal Engine 5, with contributions from researchers like Tim Brooks and Bill Peebles, marking a major AI video‑generation breakthrough.

Deep LearningOpenAIgenerative AI
0 likes · 6 min read
How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos