Tagged articles

video synthesis

21 articles · Page 1 of 1

May 18, 2026 · Artificial Intelligence

Testing a Cloud AI Agent: From Data Analysis to PPT to Video with a Single Input

The author walks through a hands‑on test of the Skywork cloud AI Agent, showing how it can ingest exported Excel data, generate a data‑analysis report, automatically create a PPT, produce narrated video and images, all via a single input without any local deployment.

AI AgentMultimodal GenerationPPT generation

0 likes · 8 min read

Testing a Cloud AI Agent: From Data Analysis to PPT to Video with a Single Input

Machine Heart

Apr 15, 2026 · Artificial Intelligence

From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis

OmniRoam introduces a panoramic, coarse‑to‑fine framework that generates long, trajectory‑controlled videos with higher spatial consistency and temporal coherence, offering a stable and controllable alternative to short‑clip generation and supporting real‑time preview, high‑resolution refinement, and 3D reconstruction applications.

3D reconstructionGenerative AIOmniRoam

0 likes · 8 min read

From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis

Xiaomi Tech

Jan 21, 2026 · Artificial Intelligence

Xiaomi’s AI Breakthroughs Earn Spot at ICASSP 2026

Xiaomi announced that a suite of AI research papers—including a large‑scale audio‑text dataset, a federated learning framework for domain and class generalization, a dual‑encoder music evaluation model, a cross‑domain audio‑text pre‑training system, a one‑step video‑to‑audio synthesis method, a training‑free frame‑selection technique for long‑video understanding, and a unified multimodal retrieval architecture—were accepted to the prestigious ICASSP 2026 conference, showcasing detailed methodologies, benchmark results, and potential impact across audio, vision, and multimodal AI applications.

AIICASSP 2026Multimodal

0 likes · 14 min read

Xiaomi’s AI Breakthroughs Earn Spot at ICASSP 2026

AI Engineering

Jan 8, 2026 · Artificial Intelligence

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

LTX-2, an open‑source multimodal diffusion model from Lightricks, jointly generates synchronized video and audio using an asymmetric dual‑stream architecture, achieving 49.18 processing steps per minute—far faster than many pure video models—while supporting about 20 seconds of high‑resolution output.

LTX-2Multimodal GenerationOpen-source AI

0 likes · 3 min read

LTX-2 Open‑Source: The First Model That Generates Video and Audio Together

HyperAI Super Neural

Dec 23, 2025 · Artificial Intelligence

NeurIPS 2025‑Selected Multi‑Stream Control Framework Achieves Precise Audio‑Visual Sync via Audio Demixing

The paper introduces a NeurIPS 2025‑selected multi‑stream video generation framework that demixes audio into speech, effects, and music, using dedicated control streams and a multi‑stage training strategy to achieve markedly better lip‑sync, event timing, and overall visual quality than prior methods.

MTV frameworkNeurIPS 2025audio demixing

0 likes · 9 min read

NeurIPS 2025‑Selected Multi‑Stream Control Framework Achieves Precise Audio‑Visual Sync via Audio Demixing

vivo Internet Technology

Dec 17, 2025 · Frontend Development

Turning 3D Avatars into Video: Puppeteer, H5 Frames & FFmpeg Workflow

This article explains how to overcome performance and integration challenges of 3D avatar rendering across multiple scenarios by exporting avatars as video or GIF resources using a Puppeteer‑driven H5 frame capture pipeline combined with FFmpeg video synthesis, detailing the evaluation of alternatives and the final implementation steps.

FFmpegH5Puppeteer

0 likes · 13 min read

Turning 3D Avatars into Video: Puppeteer, H5 Frames & FFmpeg Workflow

AI Frontier Lectures

Sep 8, 2025 · Artificial Intelligence

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

DynamicFace introduces a novel face‑swapping framework that combines diffusion models with composable 3D facial priors, explicitly decoupling identity, pose, expression, lighting and background, achieving superior identity preservation and motion consistency across images and videos, as demonstrated by extensive qualitative and quantitative comparisons with SOTA methods.

3D facial priorsdiffusion modelface swapping

0 likes · 10 min read

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

Bilibili Tech

Sep 4, 2025 · Artificial Intelligence

How AniME Automates Long‑Form Animation with a Director‑Driven Multi‑Agent AI Framework

AniME introduces a director‑driven multi‑agent system that combines a custom model‑selection protocol (MCP) with the open‑source AniSora V3 model to automatically generate consistent, high‑quality long‑form animation from story scripts, handling everything from storyboard creation to video editing and quality evaluation.

Generative AIStoryboardanimation

0 likes · 15 min read

How AniME Automates Long‑Form Animation with a Director‑Driven Multi‑Agent AI Framework

AIWalker

Aug 19, 2025 · Artificial Intelligence

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

DynamicFace introduces a diffusion‑based framework that explicitly decouples identity, pose, expression, illumination and background using composable 3D facial priors, achieving superior identity preservation, motion consistency and visual fidelity in both image and video face‑swapping tasks.

3D facial priorsDiffusion Modelscontrollable generation

0 likes · 13 min read

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

Xiaohongshu Tech REDtech

Aug 18, 2025 · Artificial Intelligence

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

DynamicFace introduces a controllable face‑swapping framework that leverages composable 3D facial priors, dual‑stream identity injection, and a FusionTVO module to achieve superior image and video quality, identity preservation, and temporal consistency, outperforming existing state‑of‑the‑art methods on benchmark datasets.

3D facial priorsAIcontrollable generation

0 likes · 13 min read

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

AIWalker

Jun 30, 2025 · Artificial Intelligence

Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves

FilMaster is a pioneering AI system that learns cinematic principles from a 440,000‑shot movie database, combines multimodal LLMs, RAG, and audience‑centric rhythm control to generate editable, high‑quality films, and outperforms prior methods by over 50% on the new FilmEval benchmark.

AI film generationFilmEval benchmarkRetrieval-Augmented Generation

0 likes · 18 min read

Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves

Bilibili Tech

May 20, 2025 · Artificial Intelligence

How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback

Researchers at Bilibili present Index‑Anisora, an open‑source anime video generation framework that builds a 30k‑sample reward dataset, introduces the multi‑dimensional AnimeReward model and a Gap‑Aware Preference Optimization (GAPO) method, and demonstrate through extensive automatic and human evaluations that their approach significantly outperforms baseline video generators.

AIGAPOHuman Feedback

0 likes · 20 min read

How AnimeReward and GAPO Transform Anime Video Generation with Human Feedback

Alipay Experience Technology

Apr 25, 2025 · Artificial Intelligence

Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic

This article introduces EchoMimic V1 and V2, open‑source generative digital‑human systems that turn a single voice clip and a portrait photo into synchronized talking avatars, covering their technical background, architecture, training strategies, performance comparisons, and potential application scenarios.

Generative AIdigital avatarmultimodal models

0 likes · 13 min read

Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic

Swan Home Tech Team

Apr 21, 2025 · Artificial Intelligence

How Front-End Teams Leverage AI: FastGPT Platform, Intelligent Search, and Video Synthesis

This article examines how a front‑end team uses AI innovations—FastGPT visual platform, AI‑powered semantic search, and AI video synthesis—to rebuild business workflows, cut costs, and boost efficiency, highlighting architecture, technical highlights, and practical use cases.

AILow‑code platformMultimodal

0 likes · 7 min read

How Front-End Teams Leverage AI: FastGPT Platform, Intelligent Search, and Video Synthesis

MaGe Linux Operations

Mar 28, 2025 · Artificial Intelligence

How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide

This article explains the fundamentals of AI video technology, details the features of Alibaba Cloud's Tongyi Wanxiang platform, demonstrates how to use DeepSeek for script generation, and provides a complete workflow—including code examples—for producing high‑quality AI‑generated videos.

AI video generationDeepSeekJava SDK

0 likes · 24 min read

How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide

DaTaobao Tech

Mar 3, 2025 · Artificial Intelligence

How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation

Taobao’s AIGC video generation platform, built on a large‑scale “Faxiang” model that evolved from UNet to DiT, leverages over 2 billion curated e‑commerce videos, expert alignment, Lora fine‑tuning, and multi‑control capabilities to deliver diverse, high‑quality product videos that dramatically boost conversion metrics across the marketplace.

AI video generationAIGCMultimodal

0 likes · 11 min read

How Taobao’s “Faxiang” AI Model Revolutionizes E‑Commerce Video Generation

Xiaohe Frontend Team

Apr 21, 2024 · Artificial Intelligence

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

The article reviews the latest breakthroughs in generative AI, including Microsoft’s VASA‑1 video synthesis model, Meta’s open‑source Llama‑3 large language model, Stability AI’s Stable Diffusion 3 API, Adobe’s integration of third‑party AI video tools into Premiere Pro, and a free image‑style‑recreation platform from Freepik, highlighting their technical details and potential applications.

AI toolsDiffusion ModelsGenerative AI

0 likes · 13 min read

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

Bilibili Tech

Feb 27, 2024 · Frontend Development

Browser‑Based Video Synthesis Using FFmpeg and WebAssembly

The article details how to compile FFmpeg to WebAssembly and integrate it into a browser‑based video synthesis platform, describing the runtime architecture, JSON‑driven API, key‑frame animation mapping, memory‑limit strategies, text rendering options, and future enhancements such as OPFS, SIMD, and WebGL acceleration.

FFmpegWeb DevelopmentWebAssembly

0 likes · 28 min read

Browser‑Based Video Synthesis Using FFmpeg and WebAssembly

DevOps

Feb 18, 2024 · Artificial Intelligence

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

OpenAI's Sora, the first text‑to‑video model, demonstrates unprecedented video quality and length by leveraging massive high‑quality training data, novel video‑patch representations, diffusion‑based transformer architecture, and precise subtitle generation, reshaping both AI research and media production.

OpenAISoradiffusion model

0 likes · 9 min read

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

21CTO

Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos

OpenAI’s newly unveiled Sora system can generate 60‑second, high‑quality videos from plain text prompts, leveraging a data‑driven physical engine trained on synthetic data from Unreal Engine 5, with contributions from researchers like Tim Brooks and Bill Peebles, marking a major AI video‑generation breakthrough.

Deep LearningGenerative AIOpenAI

0 likes · 6 min read

How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos

Python Programming Learning Circle

May 29, 2022 · Artificial Intelligence

Generating Lip‑Sync Videos with PaddleGAN's Wav2Lip Model

This tutorial explains how to use the open‑source PaddleGAN Wav2Lip model to synchronise any face or avatar with arbitrary speech, covering the underlying AI principles, required installations, and step‑by‑step command‑line usage for creating high‑quality dubbing videos.

AIDeep LearningPaddleGAN

0 likes · 5 min read

Generating Lip‑Sync Videos with PaddleGAN's Wav2Lip Model