Tagged articles

124 articles

Page 1 of 2

May 20, 2026 · Artificial Intelligence

How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning

The VChain framework injects multimodal large‑model reasoning into video generation, using a three‑stage visual‑thought pipeline, sparse inference‑time adaptation, and guided sampling to produce physically consistent, logically coherent videos, as demonstrated by qualitative and quantitative experiments.

Multimodal Large ModelsSparse Fine‑tuningVideo Generation

0 likes · 8 min read

How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning

Machine Heart

May 17, 2026 · Artificial Intelligence

What Exactly Is a World Model? History, Technology, and the $10 B Bet

The article traces the two decades‑long, parallel research lines that birthed video world models—dreaming agents in reinforcement learning and learning physics from human video—explains how they converged in 2024‑2025, evaluates current capabilities and limitations, and analyzes the $10 billion investment landscape and strategic moves by NVIDIA, OpenAI, and others.

AI researchReinforcement LearningRobotics

0 likes · 32 min read

What Exactly Is a World Model? History, Technology, and the $10 B Bet

Machine Heart

May 8, 2026 · Artificial Intelligence

How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding

The CHAI framework introduced by CMU and Harvard defines a structured video‑language annotation scheme, scalable human‑AI oversight, and a post‑training pipeline that enables an 8B open‑source model to outperform closed‑source GPT‑5 and Gemini‑3.1‑Pro on professional cinematic techniques.

Multimodal AIQwen3-VLVideo Generation

0 likes · 11 min read

How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding

Machine Learning Algorithms & Natural Language Processing

May 7, 2026 · Artificial Intelligence

Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts

The article argues that while modern LLMs like ChatGPT and Gemini are powerful, their usage barriers are rising, and introduces ‘Pang Goose AI’, a low‑prompt AI agent that, through a pre‑built SOP system, can produce a one‑minute e‑commerce video or an interactive data‑dashboard with a single sentence, outperforming generic models and eliminating the need for users to master prompt engineering.

AI agentsAI product reviewSOP

0 likes · 12 min read

Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts

Machine Learning Algorithms & Natural Language Processing

May 4, 2026 · Artificial Intelligence

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

A review of CVPR 2026 papers shows a shift in visual generative AI from incremental performance gains within established frameworks to a systematic rewrite of default modeling assumptions, covering new guidance mechanisms, video generation architectures, direct image prediction, fine‑grained motion control, and dense semantic correspondence.

Video Generationdiffusiongenerative AI

0 likes · 13 min read

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

Machine Heart

Apr 29, 2026 · Artificial Intelligence

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

VEGA-3D extracts the hidden 3D priors embedded in large video generation models, fuses them with semantic features via token‑level adaptive gating, and demonstrates dramatically higher multi‑view consistency and state‑of‑the‑art results on 3D scene‑understanding benchmarks such as ScanRefer, ScanQA, VSI‑Bench and LIBERO—all without any additional 3D annotations.

Embodied AIVEGA-3DVideo Generation

0 likes · 10 min read

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

Machine Heart

Apr 27, 2026 · Artificial Intelligence

Why Traditional Video Captions Fail and How MTSS Solves the Problem

The article introduces Multi-Stream Scene Script (MTSS), a structured JSON‑based video description paradigm that replaces monolithic captions, explains its design principles, compares its advantages, and presents experimental evidence showing significant gains in both video understanding and generation tasks.

MTSSMultimodal AIVideo Generation

0 likes · 8 min read

Why Traditional Video Captions Fail and How MTSS Solves the Problem

AI Explorer

Apr 24, 2026 · Artificial Intelligence

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Open Generative AI is an open‑source, MIT‑licensed desktop suite that bundles over 200 cutting‑edge image, video, and lip‑sync models into four dedicated studios, offering unrestricted generation without content filters, subscription fees, or closed ecosystems, and provides online, desktop, and self‑hosted deployment options.

AI media generationImage GenerationMIT license

0 likes · 6 min read

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Machine Heart

Apr 23, 2026 · Artificial Intelligence

Breaking the Compute Bottleneck: HKU’s First Review of Efficient Video World Models

This comprehensive review surveys how efficient modeling paradigms, architecture designs, and inference algorithms can overcome the compute‑speed trade‑off in video world models, and examines their impact on autonomous driving, embodied AI, and interactive game simulations.

Embodied AIVideo GenerationWorld Models

0 likes · 10 min read

Breaking the Compute Bottleneck: HKU’s First Review of Efficient Video World Models

Geek Labs

Apr 23, 2026 · Artificial Intelligence

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

From the rapid rise of prompt‑engineering in 2025‑2026, this article reviews seven standout open‑source GitHub repositories—covering Nano Banana Pro, GPT‑Image‑2, multi‑model prompts, and video generation—detailing their stars, content structure, multilingual support, and ideal use cases for creators.

AI prompt engineeringGitHubImage Generation

0 likes · 14 min read

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

SuanNi

Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Diffusion ModelsEvaluation MetricsMultimodal AI

0 likes · 15 min read

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

Machine Heart

Apr 12, 2026 · Artificial Intelligence

CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline

The CVPR 2026 WorldArena Challenge, organized by top academic institutions and Amap, introduces a new evaluation framework that tests video world models for physical realism and functional utility, while Amap releases its high‑performance ABot‑PhysWorld model and benchmark scores that set a new state‑of‑the‑art.

ABot-PhysWorldCVPR 2026Physical Consistency

0 likes · 9 min read

CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline

Machine Heart

Apr 10, 2026 · Artificial Intelligence

OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative

The OneStory paper presented at CVPR 2026 introduces an adaptive‑memory framework for coherent multi‑shot video generation, reformulating the task as next‑shot generation and using Frame Selection and Adaptive Conditioner modules to maintain long‑range context while supporting both text‑to‑multi‑shot and image‑to‑multi‑shot synthesis.

Adaptive MemoryMulti-shot VideoOneStory

0 likes · 8 min read

OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative

SuanNi

Apr 8, 2026 · Industry Insights

How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings

An anonymous model, HappyHorse‑1.0, quickly topped the Artificial Analysis leaderboard for both text‑to‑video and image‑to‑video tracks, outscoring Seedance 2.0 by large margins and prompting intense community discussion about its origin, performance, and future stability.

AIArtificial IntelligenceCompetitive analysis

0 likes · 5 min read

How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings

Machine Heart

Apr 4, 2026 · Artificial Intelligence

Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?

The article analyzes how AI video generation, after the launch of OpenAI's Sora, is moving from a focus on model performance to embedding video capabilities into existing platforms and business workflows, highlighting timeline shifts, key players, and emerging competitive criteria.

AI videoMarket TrendsOpenAI Sora

0 likes · 7 min read

Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?

SuanNi

Mar 25, 2026 · Industry Insights

Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation

OpenAI abruptly discontinued its Sora video‑generation app after a brief period of explosive popularity, revealing massive GPU costs, unsustainable pricing, fierce competition from rivals like Gemini and Claude, and a strategic pivot toward enterprise‑focused AI services.

AIOpenAIVideo Generation

0 likes · 10 min read

Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation

Amap Tech

Mar 20, 2026 · Artificial Intelligence

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

ABot-PhysWorld introduces a physically consistent video generation framework for embodied AI, leveraging the PAI‑Bench benchmark, large‑scale multi‑modal data, DPO preference alignment, and dense action maps to surpass SOTA models in both visual quality and physical plausibility across diverse robotic tasks.

Deep LearningEmbodied AIPhysical Consistency

0 likes · 15 min read

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

Java Tech Enthusiast

Mar 7, 2026 · Artificial Intelligence

Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation

This article introduces several open‑source AI Skills—including Remotion, YouTube‑clipper, skill‑from‑masters, NotebookLM, Markdown‑to‑X publisher, and Anthropic's Agent Skills—detailing their purpose, core features, installation commands, and repository links for developers seeking automation solutions.

ClaudeDocument ProcessingVideo Generation

0 likes · 7 min read

Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation

Old Meng AI Explorer

Mar 5, 2026 · Industry Insights

Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching

This article introduces three high‑impact open‑source projects—tunnelto for instant public access to local services, Toonflow‑app for fully automated AI short‑video production from text, and cc‑switch for one‑click switching and unified configuration of multiple large‑model AI tools—highlighting their key features, cross‑platform support, and GitHub repositories.

AIVideo Generationdevelopment-tools

0 likes · 8 min read

Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching

Model Perspective

Feb 15, 2026 · Artificial Intelligence

Mastering Seedance 2.0: A Complete Guide to Video Generation with Multi‑Modal Prompts

This guide explains how to use ByteDance's Seedance 2.0 video generation model, covering its capabilities, input formats, prompt syntax, platform options, practical examples, common pitfalls, and advanced workflows for creating high‑quality, controllable short videos.

AI modelPrompt engineeringSeedance 2.0

0 likes · 16 min read

Mastering Seedance 2.0: A Complete Guide to Video Generation with Multi‑Modal Prompts

HyperAI Super Neural

Feb 14, 2026 · Artificial Intelligence

Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models

WorldArena introduces a unified benchmark that evaluates generated videos not only for visual fidelity but also for embodied task functionality across six dimensions, exposing a stark gap between visual realism and practical usefulness and providing a composite EWMScore to compare models.

Embodied AIEvaluation MetricsPhysical Consistency

0 likes · 9 min read

Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models

Volcano Engine Developer Services

Feb 13, 2026 · Artificial Intelligence

Deploy and Run Seedance Video Generation Skill in Feishu via OpenClaw

This guide walks you through installing OpenClaw, configuring a Feishu chatbot, adding the Seedance video‑generation skill, and using it to create text‑to‑video or image‑to‑video content, including detailed steps, required permissions, code snippets, and supported AI models.

AI SkillFeishuOpenClaw

0 likes · 9 min read

Deploy and Run Seedance Video Generation Skill in Feishu via OpenClaw

AI Engineering

Feb 13, 2026 · Artificial Intelligence

ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090

ByteDance has open‑sourced the 12‑billion‑parameter video generation model Alive, which supports text‑to‑video/audio, image‑to‑video/audio, pure text‑to‑video and text‑to‑audio modes, runs on a 24 GB GPU, outperforms competitors in cross‑modal synchronization, and includes novel TA‑CrossAttn and UniTemp‑RoPE techniques.

Alive ModelByteDanceCross‑Modal Synchronization

0 likes · 5 min read

ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090

Bilibili Tech

Jan 28, 2026 · Artificial Intelligence

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

This article examines the challenges of optimizing video generation model inference, moving from operator-level tweaks to full-graph compilation using torch.compile, and details systematic strategies to eliminate Graph Breaks, handle dynamic shapes, KV-Cache indexing, and Python-side caches, achieving a 47.6% speedup on a 14B model without accuracy loss.

AIInference AccelerationVideo Generation

0 likes · 14 min read

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

Old Meng AI Explorer

Jan 27, 2026 · Artificial Intelligence

Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation

In the era of abundant AI utilities, this article highlights three recently popular open‑source projects—Spider_XHS for comprehensive Xiaohongshu data collection and automated posting, PPTAgent for one‑click, multi‑scene PowerPoint generation, and Code2Video for code‑driven, high‑quality video creation—detailing their core features, deployment steps, and GitHub links.

AI toolsPPT automationVideo Generation

0 likes · 7 min read

Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation

Design Hub

Jan 13, 2026 · Artificial Intelligence

Three AI-Powered Design Tools That Boost Creativity

The article reviews three open‑source AI tools—Claude Cowork for file‑based assistance, the LTX‑2 video generation model runnable on 8 GB GPUs via Pinokio, and SongGeneration Studio for end‑to‑end music creation—detailing their features, performance benchmarks, and usage steps for creators.

AI designClaudeLTX-2

0 likes · 8 min read

Three AI-Powered Design Tools That Boost Creativity

Kuaishou Tech

Jan 8, 2026 · Artificial Intelligence

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou secured 12 papers at AAAI 2026, covering advances in search and recommendation systems, multi‑camera video generation, multimodal understanding, generative model fundamentals, video large language models, experimental design, and LLM latent‑space reasoning, with three papers highlighted as oral presentations.

AILLMVideo Generation

0 likes · 22 min read

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

DataFunSummit

Dec 20, 2025 · Artificial Intelligence

How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications

This article details AutoHome's end‑to‑end development of the Cangjie large model, covering the training infrastructure with distributed data, pipeline and tensor parallelism, core business use cases such as video script generation and multi‑tool Agent capabilities, inference optimizations through quantization and fast serving frameworks, and future directions for personalized automotive AI services.

Agent AIDistributed TrainingVideo Generation

0 likes · 19 min read

How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications

HyperAI Super Neural

Dec 12, 2025 · Artificial Intelligence

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

The AI Open‑Source Forum brought together researchers from Peking University, Tsinghua, Zilliz and MoonBit to share open‑source advances in audio‑synchronized video generation, vector database architecture, lightweight vision backbones, and an AI‑native programming language, highlighting datasets, system designs, and future collaborative directions.

AIAI‑Native ProgrammingVideo Generation

0 likes · 12 min read

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

Data Party THU

Dec 9, 2025 · Artificial Intelligence

Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough

The GenMimic paper introduces a novel framework that enables humanoid robots to zero‑shot imitate human actions generated by AI video models, presenting a new dataset, a two‑stage 4D reconstruction pipeline, and a reinforcement‑learning strategy with weighted‑tracking and symmetry losses, validated in simulation and on a real 23‑DoF robot.

Humanoid RobotsReinforcement LearningRobotics

0 likes · 11 min read

Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough

Data Party THU

Dec 2, 2025 · Artificial Intelligence

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

FFGo reveals that the first frame of text‑to‑video models acts as a conceptual memory buffer storing visual entities, and by using a few‑shot LoRA trained on only 20‑50 curated examples with a special transition prompt, it reliably activates multi‑object fusion, enabling high‑quality, controllable video customization without model architecture changes.

AI researchVideo Generationconceptual memory

0 likes · 9 min read

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

AI Frontier Lectures

Nov 28, 2025 · Artificial Intelligence

Can AI Generate the Next Step in a Video? Inside the VANS Model

Researchers from Kuaishou and Hong Kong City University introduce VANS, a novel Video-as-Answer system that predicts and visualizes the next event in a video by jointly optimizing a visual language model and a video diffusion model, enabling personalized step‑by‑step guidance and future scenario generation.

Multimodal AIVideo Generationfuture prediction

0 likes · 10 min read

Can AI Generate the Next Step in a Video? Inside the VANS Model

HyperAI Super Neural

Nov 25, 2025 · Artificial Intelligence

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

LongCat‑Video, an open‑source video generation model from Meituan, adopts a unified multi‑task architecture to handle text‑to‑video, image‑to‑video and video‑continuation, delivers minute‑long high‑quality clips with coarse‑to‑fine inference, achieves benchmark scores comparable to leading models like Wan2.2, and provides a one‑click deployment tutorial on HyperAI.

LongCat-VideoMeituanRLHF

0 likes · 6 min read

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

Kuaishou Tech

Nov 24, 2025 · Artificial Intelligence

How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained

This article details a new research pipeline that leverages large‑scale human preference data, a multi‑dimensional video reward model, and specialized alignment algorithms to dramatically improve video generation quality, motion fidelity, and text‑video consistency, with open‑source code and benchmarks for reproducibility.

AI AlignmentHuman FeedbackRLHF

0 likes · 10 min read

How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained

AI Large Model Application Practice

Nov 24, 2025 · Artificial Intelligence

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

This article breaks down the end‑to‑end engineering pipeline that converts a knowledge source such as a URL or PDF into a narrated PPT‑style video, detailing six core stages—from knowledge extraction and script generation to image creation, voice synthesis, and final video stitching—while highlighting practical model choices, prompt design, and stability tricks.

Artificial IntelligenceLLMMultimodal

0 likes · 16 min read

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

Wuming AI

Oct 16, 2025 · Industry Insights

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

This week’s AI landscape saw Karpathy’s NanoChat open‑sourcing a 8‑K‑line ChatGPT replica, Ant Group unveiling a trillion‑parameter Ring‑1T model, Alibaba releasing the 4B/8B Qwen3‑VL visual language models that outperform Gemini 2.5 Flash Lite and GPT‑5 Nano, Google launching Veo 3.1 for high‑fidelity video generation, and Anthropic announcing Claude Haiku 4.5, a faster and cheaper LLM that excels on SWE‑bench benchmarks.

AI modelsLarge Language ModelsMultimodal

0 likes · 7 min read

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

Amap Tech

Oct 3, 2025 · Artificial Intelligence

How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene

FantasyHSI introduces a graph‑based multi‑agent framework that combines visual‑language models and video‑generation diffusion to let digital humans perceive, plan, and interact autonomously in any 3D scene, producing physically plausible, long‑duration actions for animation creation and embodied‑AI simulation.

3D synthesisGraph ModelingReinforcement Learning

0 likes · 12 min read

How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene

Amap Tech

Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingComputer VisionDiffusion Models

0 likes · 11 min read

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

AI2ML AI to Machine Learning

Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Diffusion ModelsLatent DiffusionMultimodal AI

0 likes · 7 min read

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

Mashang Consumer UXC

Sep 29, 2025 · Artificial Intelligence

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

This article reviews the latest open‑source AI models released by major tech firms—including Tencent's 3D‑Omni and 3D‑Part, Shengshu Tech's Vidu Q2 for facial video, Nvidia's Audio2Face for real‑time facial animation, plus updates from Figma, Google, Alibaba and Kuaishou—highlighting their capabilities and potential applications in gaming, AR/VR, design and content creation.

3D modelingAIDeep Learning

0 likes · 8 min read

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

DataFunTalk

Sep 27, 2025 · Artificial Intelligence

How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films

The article explores how AI models like Seedream and Seedance are reshaping cinema, from AI‑driven short films showcased at the Busan Film Festival to full‑length feature productions, highlighting technical breakthroughs, industry perspectives, and the emerging "AI +" versus "+ AI" production paradigms.

AIAIGCCinema

0 likes · 11 min read

How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films

Kuaishou Tech

Sep 16, 2025 · Artificial Intelligence

How Kling-Avatar Generates Long, Emotionally Rich Digital Human Videos with Multimodal LLMs

Kuaishou's Kling-Avatar leverages a multimodal large‑language‑model‑driven two‑stage generation framework to produce minute‑long digital‑human videos that synchronize lip movements, facial expressions, and body gestures with audio, achieving high visual quality, identity consistency, and controllable storytelling across diverse scenarios.

AI AvatarDigital HumanMultimodal LLM

0 likes · 9 min read

DataFunTalk

Sep 11, 2025 · Artificial Intelligence

How AI Dressing and Multimodal Models Transform Home Service Experiences

During a pre-conference interview, AI expert Wang Mingzhong details how multimodal AI dressing, video résumé creation, short‑video templates, and interactive digital‑human live streams are technically realized for 58 Home Services, highlighting model training, workflow optimization, and future fusion of template‑based and agent‑driven video generation.

AIDigital HumanDomestic Service

0 likes · 11 min read

How AI Dressing and Multimodal Models Transform Home Service Experiences

Data STUDIO

Sep 3, 2025 · Artificial Intelligence

Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation

The author evaluates the free open‑source AiPy tool through three real‑world cases—U.S. GDP trend analysis, Cambricon stock assessment, and short‑video production—showing its local, code‑free workflow for data processing, report generation, and multimedia creation while noting minor visual glitches and their fixes.

AIAiPyTool Review

0 likes · 7 min read

Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation

Mashang Consumer UXC

Aug 29, 2025 · Artificial Intelligence

Create Consistent IP Characters and Short Videos with AI Tools 星流 & 即梦

This step‑by‑step guide shows how to design a cohesive IP character, generate its three‑view illustrations, craft dynamic scene images, and turn them into eye‑catching short videos using the AI platforms 星流 and 即梦, dramatically shortening the design cycle.

AI-generated designIP character creationVideo Generation

0 likes · 6 min read

Create Consistent IP Characters and Short Videos with AI Tools 星流 & 即梦

Kuaishou Tech

Aug 25, 2025 · Artificial Intelligence

How Context-as-Memory Enables Scene‑Consistent Long Video Generation

This article introduces the Context-as-Memory approach, which treats previously generated video frames as memory to achieve scene‑consistent interactive long video generation, and details a camera‑trajectory‑based memory retrieval mechanism that dramatically improves efficiency and performance over existing state‑of‑the‑art methods.

AIVideo Generationcontext memory

0 likes · 7 min read

How Context-as-Memory Enables Scene‑Consistent Long Video Generation

Amap Tech

Aug 18, 2025 · Artificial Intelligence

How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE

Omni-Effects introduces a unified framework that combines LoRA‑based expert mixture models and spatially aware prompts to generate multiple, precisely placed visual effects in video, supported by the new Omni‑VFX dataset and evaluation suite, demonstrating superior spatial control and diversity over prior single‑effect methods.

AILoRAVideo Generation

0 likes · 8 min read

How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE

AI Frontier Lectures

Jul 30, 2025 · Artificial Intelligence

DualReal: Seamless Identity and Motion Customization for Video Generation

DualReal introduces a novel adaptive joint training framework that simultaneously customizes subject identity and motion dynamics in video generation, overcoming the conflicts of traditional isolated approaches by using a dual-domain perception adapter and stage-fusion controller, achieving up to 31.8% improvement on CLIP‑I and DINO‑I metrics.

Diffusion ModelsVideo Generationdual-domain adaptation

0 likes · 13 min read

DualReal: Seamless Identity and Motion Customization for Video Generation

Amap Tech

Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Diffusion ModelsImage RestorationVideo Generation

0 likes · 14 min read

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

Amap Tech

Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchDiffusion ModelsImage Restoration

0 likes · 14 min read

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

Kuaishou Large Model

Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsImage GenerationTest-Time Scaling

0 likes · 8 min read

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

Kuaishou Tech

Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsImage Generation

0 likes · 8 min read

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

AntTech

Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Computer VisionVideo Generation

0 likes · 20 min read

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

Kuaishou Tech

Jun 10, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsDiffusion Models

0 likes · 70 min read

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

AIWalker

May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Diffusion ModelsVideo Generationautoregressive modeling

0 likes · 16 min read

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

Baidu MEUX

Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIImage GenerationVideo Generation

0 likes · 8 min read

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

Alibaba Cloud Developer

Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelComputer VisionDeep Learning

0 likes · 5 min read

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

AIWalker

Mar 31, 2025 · Artificial Intelligence

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

VBench-2.0 expands the original VBench suite by introducing six fine‑grained dimensions—Human Fidelity, Controllability, Creativity, Physics, Commonsense, and more—to evaluate not only the visual quality of generated videos but also their intrinsic faithfulness to physical laws, common sense, and narrative coherence, providing open‑source tools, prompts, and human‑aligned metrics for the research community.

AI EvaluationIntrinsic FaithfulnessMultimodal

0 likes · 12 min read

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

AI Frontier Lectures

Mar 30, 2025 · Artificial Intelligence

How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization

This article provides an in‑depth analysis of the NOVA model, a non‑quantized autoregressive video generation framework that combines frame‑by‑frame temporal prediction with set‑by‑set spatial prediction, uses diffusion loss for token estimation, and achieves state‑of‑the‑art results on multiple video and image benchmarks.

AI researchAutoregressive ModelNOVA

0 likes · 15 min read

How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization

AI Frontier Lectures

Mar 14, 2025 · Artificial Intelligence

Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p

Open‑Sora 2.0, an open‑source 11‑billion‑parameter video generation model, delivers 720p 24 fps videos with visual quality and text‑image alignment comparable to proprietary systems like HunyuanVideo and Step‑Video, while cutting training costs to $200 k using only 224 GPUs, and the release includes full code, weights, and a Gradio demo.

3D autoencoderAIMMDiT

0 likes · 7 min read

Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p

NewBeeNLP

Mar 14, 2025 · Artificial Intelligence

How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost

Open‑Sora 2.0 is an open‑source 11B‑parameter video generation model that matches commercial SOTA performance while being trained on 224 GPUs for just $200,000, thanks to a 3D auto‑encoder, MMDiT architecture, aggressive data filtering, low‑resolution pre‑training, and highly optimized parallel training techniques.

AI modelMMDiTOpen-Sora

0 likes · 9 min read

How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost

DataFunTalk

Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAHardware acceleration

0 likes · 11 min read

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

AI Product Manager Community

Feb 26, 2025 · Artificial Intelligence

How Alibaba Cloud’s Open‑Source Wan 2.1 Sets New Benchmarks in Video Generation

Alibaba Cloud’s newly open‑sourced visual generation model Wan 2.1 achieves a VBench score of 86.22%, outperforms leading models, runs on consumer‑grade GPUs with only 8.2 GB VRAM, and supports multi‑task video creation, marking a significant step for open‑source video AI.

Alibaba CloudComputer VisionVideo Generation

0 likes · 6 min read

How Alibaba Cloud’s Open‑Source Wan 2.1 Sets New Benchmarks in Video Generation

DaTaobao Tech

Feb 24, 2025 · Artificial Intelligence

AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications

The article surveys recent AIGC video generation advances for Taobao e‑commerce, detailing lip‑sync models like Wav2Lip and MuseTalk, head‑driven systems such as Hallo and EchoMimic, body‑driven pipelines including AnimateAnyone and Tango, and a four‑stage production workflow that boosts click‑through rates and enables virtual try‑on.

AIGCDeep LearningMultimodal AI

0 likes · 21 min read

AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications

Infra Learning Club

Feb 21, 2025 · Artificial Intelligence

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

This article introduces five open‑source AI tools—a PPT generator, an LLM app development platform, a cloud‑agnostic AI runner, a curated collection of LLM applications, and a one‑click HD video creator—detailing their key features, usage links, and sample configurations.

AIDifyLLM

0 likes · 8 min read

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

AIWalker

Feb 13, 2025 · Artificial Intelligence

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

FlashVideo introduces a two‑stage framework that first generates low‑resolution videos with strong prompt fidelity and then uses flow‑matching ODE trajectories to upscale to 4K quality in just four function evaluations, achieving top VBench‑Long scores while cutting generation time by up to five‑fold.

AIFlashVideoVideo Generation

0 likes · 26 min read

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

AIWalker

Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Image GenerationLarge-Scale TrainingMultimodal AI

0 likes · 29 min read

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

AIWalker

Feb 10, 2025 · Artificial Intelligence

FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation

FlashVideo introduces a two‑stage diffusion framework that first ensures prompt fidelity at low resolution with a 5‑billion‑parameter DiT, then efficiently adds fine details at high resolution using flow matching, achieving state‑of‑the‑art quality with dramatically lower compute cost.

AIDiffusion ModelsFlashVideo

0 likes · 21 min read

FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation

Alibaba Cloud Native

Feb 7, 2025 · Cloud Native

Deploy a One‑Click AI Script & Animation Platform with Function Compute

This guide walks you through using Alibaba Cloud Function Compute and Bailei Model Service to set up a cloud‑native, one‑click AI creation pipeline that turns scripts into subtitles, images, and videos for New Year storytelling.

AI content creationAlibaba CloudBailei Model Service

0 likes · 6 min read

Deploy a One‑Click AI Script & Animation Platform with Function Compute

Python Programming Learning Circle

Jan 24, 2025 · Fundamentals

Creating a Cherry Blossom Timelapse with Python: Image Processing and Video Generation

This article demonstrates how to use Python, OpenCV, and Pillow to programmatically generate frames that depict the gradual opening of cherry blossoms, assemble them into a video, and share the result as a timelapse celebrating Wuhan University's spring scenery.

PythonTutorialVideo Generation

0 likes · 5 min read

Creating a Cherry Blossom Timelapse with Python: Image Processing and Video Generation

ZhongAn Tech Team

Jan 19, 2025 · Artificial Intelligence

Weekly AI Digest Issue 11: Recommendation Algorithms, Video Generation Advances, and AGI Research

This issue of the weekly AI digest explores Xiaohongshu’s NoteLLM recommendation system, compares Chinese text generation in video AI across major platforms, highlights Alibaba’s Tongyi Wanxiang breakthroughs, discusses Keras founder François Chollet’s new AGI‑focused lab, and reviews Google’s Veo 2 and Imagen‑3 advancements.

AGIAIRecommendation Systems

0 likes · 11 min read

Weekly AI Digest Issue 11: Recommendation Algorithms, Video Generation Advances, and AGI Research

Alibaba Cloud Native

Jan 16, 2025 · Cloud Native

Build an AI‑Powered Audiobook Production Pipeline with Cloud Native CAP

This guide explains how to use Alibaba Cloud's Cloud Native Application Platform (CAP), Function Compute, and Baillian model service to create an end‑to‑end automated workflow that transforms text into audio, subtitles, images, and finally a compiled video audiobook.

AICloud NativeVideo Generation

0 likes · 6 min read

Build an AI‑Powered Audiobook Production Pipeline with Cloud Native CAP

AIWalker

Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

Diffusion TransformerVideo Generationconditional adaptive normalization

0 likes · 16 min read

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

58UXD

Dec 18, 2024 · Artificial Intelligence

Transform Your Designs with AI: 5 Steps to Create Stunning Videos

Learn how designers can harness AI tools in five practical steps—from script generation and AI‑driven image creation to video synthesis, music production, and final editing—to craft compelling, high‑quality videos that boost creativity and efficiency.

AI toolsAI videoVideo Generation

0 likes · 4 min read

Transform Your Designs with AI: 5 Steps to Create Stunning Videos

php Courses

Dec 13, 2024 · Artificial Intelligence

OpenAI Releases Sora Video Generation Model: Three Key Implications and Core Features

OpenAI's new Sora model introduces AI-powered video generation, empowering creators, expanding interaction beyond text, and marking a pivotal step toward AGI by enabling machines to understand and produce visual content, with a suite of tools such as Explore, StoryBoard, Remix, Loop, and Blend.

Artificial IntelligenceOpenAISora

0 likes · 4 min read

OpenAI Releases Sora Video Generation Model: Three Key Implications and Core Features

Alibaba Cloud Big Data AI Platform

Dec 4, 2024 · Artificial Intelligence

How EasyAnimate V5 Advances AI Video Generation with Multimodal Control

EasyAnimate V5, an Alibaba Cloud AI video generation framework, expands model size to 7B/12B, introduces multimodal control, token‑length based training, and inpaint‑based image‑to‑video strategies, while providing easy deployment via PAI, DSW, and local ComfyUI integration.

AILoRAMMDiT

0 likes · 11 min read

How EasyAnimate V5 Advances AI Video Generation with Multimodal Control

Alipay Experience Technology

Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

EchoMimicV2 is an open‑source digital‑human framework that generates high‑quality half‑body animation videos from a single reference image, an audio clip, and a hand‑gesture sequence, addressing challenges of facial portrait limits, complex condition injection, and inference latency in audio‑driven animation.

AI researchDiffusion ModelsDigital Human

0 likes · 18 min read

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

ZhongAn Tech Team

Nov 16, 2024 · Artificial Intelligence

Weekly AI Digest Issue 2: Video Generation, Large Models, AGI, and LoRA Fine‑Tuning

This weekly AI roundup discusses emerging video generation tools like PixelDance and Vidu 1.5, debates on scaling limits of large models, AGI geopolitical considerations, and a MIT study comparing LoRA with full fine‑tuning for domain adaptation.

AGIAIFine-tuning

0 likes · 8 min read

Weekly AI Digest Issue 2: Video Generation, Large Models, AGI, and LoRA Fine‑Tuning

Baobao Algorithm Notes

Oct 17, 2024 · Artificial Intelligence

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

Meta’s newly released 92‑page Movie Gen paper introduces a multimodal LLM that unifies text‑to‑image, text‑to‑video, personalized video, precise video editing, and audio generation, detailing its dual‑model architecture, training pipeline, temporal auto‑encoder design, scaling strategies, evaluation benchmark, and ablation studies.

Deep LearningModel ScalingMultimodal LLM

0 likes · 34 min read

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

DataFunSummit

Oct 10, 2024 · Artificial Intelligence

AIGC‑Assisted Marketing Material Generation at Shujia Technology

This article describes Shujia Technology's use of artificial intelligence to generate marketing images and videos, outlining the background, challenges of high-volume content production, detailed solutions for image and video assets—including layout models, diffusion models, and digital human synthesis—and future research directions.

AIGCDigital HumanImage Generation

0 likes · 12 min read

AIGC‑Assisted Marketing Material Generation at Shujia Technology

Volcano Engine Developer Services

Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

Computer VisionPixelLMStoryDiffusion

0 likes · 22 min read

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

360 Tech Engineering

Aug 29, 2024 · Artificial Intelligence

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

FancyVideo is an open‑source UNet‑based video generation model that supports arbitrary resolutions, aspect ratios, styles, and motion dynamics by introducing a Cross‑frame Textual Guidance Module (CTGM) with temporal injectors, refiners, and boosters, achieving state‑of‑the‑art results on multiple benchmarks and enabling versatile applications such as video extension, backtracking, and frame interpolation.

AI researchUNetVideo Generation

0 likes · 6 min read

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Tencent Advertising Technology

Jul 31, 2024 · Artificial Intelligence

MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis

MimicMotion is a controllable video generation framework that produces smooth, high-quality human motion videos by leveraging skeletal action guidance, addressing challenges in video generation such as limited length, weak controllability, and lack of dynamic detail.

AIDiffusion ModelsMimicMotion

0 likes · 13 min read

MimicMotion: A Controllable Video Generation Framework for High-Quality Human Motion Synthesis

Qunar Tech Salon

Jul 25, 2024 · Artificial Intelligence

AI-Generated Video Practices for International Hotels

At the WOT2024 conference, Qunar Travel’s CTO Zheng Jimin presented a comprehensive overview of AI-generated video production for international hotels, detailing challenges, AI-driven workflow automation, practical implementation steps, multilingual translation enhancements, and performance results, offering valuable insights for scaling high‑quality hotel video content.

AIAIGCHotel Industry

0 likes · 11 min read

AI-Generated Video Practices for International Hotels

Baidu Geek Talk

Jul 24, 2024 · Artificial Intelligence

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Li Yilin’s AI project blends Peking Opera characters with traditional ink‑wash painting by using PaddleHub for style transfer and PaddleGAN’s First‑Order Motion model for facial motion, then adds music and Wav2Lip lip‑sync, producing videos that modernize Chinese heritage and gauge public cultural awareness.

AIComputer VisionDeep Learning

0 likes · 9 min read

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Alibaba Cloud Big Data AI Platform

Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIComputer VisionDiffusion Transformer

0 likes · 5 min read

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

Kuaishou Large Model

Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIComputer VisionDiffusion Models

0 likes · 9 min read

How I2V-Adapter Turns Images into Videos with Minimal Training

Kuaishou Tech

Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIComputer VisionDiffusion Models

0 likes · 8 min read

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

Alibaba Cloud Big Data AI Platform

Jun 19, 2024 · Artificial Intelligence

Deploy and Fine‑Tune EasyAnimate for High‑Res Video Generation on Alibaba Cloud PAI

EasyAnimate is Alibaba Cloud PAI's DiT video generation framework that provides a complete HD video generation solution, and this guide walks you through integrating EasyAnimate on PAI, setting up prerequisites, creating DSW instances, installing the model, performing inference via code or WebUI, fine‑tuning LoRA, and using the API.

Alibaba Cloud PAIDSWEasyAnimate

0 likes · 14 min read

Deploy and Fine‑Tune EasyAnimate for High‑Res Video Generation on Alibaba Cloud PAI

Alibaba Cloud Big Data AI Platform

Jun 4, 2024 · Artificial Intelligence

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

EasyAnimate, an open‑source DiT‑based video generation framework from Alibaba Cloud AI Platform PAI, offers a complete pipeline—including data preprocessing, VAE and DiT training, LoRA fine‑tuning, motion‑module integration, and scalable inference up to 768×768 resolution and 144 frames—leveraging Diffusion Transformers to produce longer, higher‑quality videos.

AI videoDiffusion TransformerLoRA

0 likes · 14 min read

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

JD Cloud Developers

May 14, 2024 · Artificial Intelligence

Create Digital Avatars and Face Swaps with EasyPhoto on JD Cloud

Learn how to install and use the EasyPhoto plugin on JD Cloud’s Stable Diffusion WebUI to generate digital avatars, perform multi‑person face swaps, and create AI‑generated videos, with step‑by‑step instructions, screenshots, and tips for optimal settings and coupon usage.

AI AvatarVideo Generationcloud-computing

0 likes · 6 min read

Create Digital Avatars and Face Swaps with EasyPhoto on JD Cloud

MoonWebTeam

May 14, 2024 · Frontend Development

Top 9 Front-End & AI Trends Shaping 2024: From Apple’s MM1 to Micro‑Frontends

This monthly roundup highlights nine cutting‑edge topics—from Apple’s multimodal MM1 model and the Signals standardization proposal to Stable Video Diffusion, digital humans, micro‑frontend frameworks, Monkey testing automation, Tango low‑code sandbox, and cross‑platform app frameworks—offering deep insights and practical takeaways for modern developers.

Artificial IntelligenceMicro FrontendsVideo Generation

0 likes · 17 min read

Top 9 Front-End & AI Trends Shaping 2024: From Apple’s MM1 to Micro‑Frontends

DataFunTalk

May 3, 2024 · Artificial Intelligence

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

This article reviews the rapid progress of text‑to‑video generation, explains diffusion‑based video synthesis, outlines key technical challenges such as motion modeling, semantic alignment and quality, and presents Tencent’s solutions and real‑world applications, while also discussing future directions and the impact of OpenAI’s Sora model.

AIDiffusion ModelsSora

0 likes · 23 min read

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

Architect

Apr 16, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

This article dissects the possible architecture of OpenAI's Sora video model, tracing its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion backbone, long‑time consistency strategies, and training pipeline, while comparing alternatives such as MAGVIT‑v2, TECO, NaViT, and FDM to reveal why each design choice may have been made.

AI ArchitectureLatent DiffusionSora

0 likes · 51 min read

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

Alimama Tech

Apr 10, 2024 · Artificial Intelligence

SizeCube: AI‑Driven Arbitrary‑Size Image and Video Outpainting for Advertising

SizeCube leverages Stable Diffusion‑based diffusion models and a sophisticated pipeline—including quality filtering, feature mining, latent‑space UNet denoising, super‑resolution, and temporal 3D‑U‑Net video processing—to automatically outpaint images and videos to any size, boosting Alibaba advertisers’ creative flexibility, click‑through rates, and asset adaptability across diverse ad placements.

AIAdvertisingImage Outpainting

0 likes · 14 min read

SizeCube: AI‑Driven Arbitrary‑Size Image and Video Outpainting for Advertising

Architects' Tech Alliance

Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelMultimodalSora

0 likes · 6 min read

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Architect

Mar 28, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

This article explains OpenAI's Sora video generation model, detailing its latent diffusion foundation, video compression network, spacetime patch representation, Diffusion Transformer processing, and decoding pipeline, while also reviewing related Stable Diffusion and Transformer concepts that enable high‑quality text‑to‑video synthesis.

AIDeep LearningLatent Diffusion

0 likes · 17 min read

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

DevOps

Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

AIOpenAISora

0 likes · 8 min read

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

DaTaobao Tech

Mar 25, 2024 · Artificial Intelligence

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

AIAIGCDiffusion Models

0 likes · 13 min read

Survey of AIGC Video Generation Algorithms