Tagged articles
124 articles
Page 1 of 2
Machine Heart
Machine Heart
May 20, 2026 · Artificial Intelligence

How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning

The VChain framework injects multimodal large‑model reasoning into video generation, using a three‑stage visual‑thought pipeline, sparse inference‑time adaptation, and guided sampling to produce physically consistent, logically coherent videos, as demonstrated by qualitative and quantitative experiments.

Multimodal Large ModelsSparse Fine‑tuningVideo Generation
0 likes · 8 min read
How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

What Exactly Is a World Model? History, Technology, and the $10 B Bet

The article traces the two decades‑long, parallel research lines that birthed video world models—dreaming agents in reinforcement learning and learning physics from human video—explains how they converged in 2024‑2025, evaluates current capabilities and limitations, and analyzes the $10 billion investment landscape and strategic moves by NVIDIA, OpenAI, and others.

AI researchReinforcement LearningRobotics
0 likes · 32 min read
What Exactly Is a World Model? History, Technology, and the $10 B Bet
Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding

The CHAI framework introduced by CMU and Harvard defines a structured video‑language annotation scheme, scalable human‑AI oversight, and a post‑training pipeline that enables an 8B open‑source model to outperform closed‑source GPT‑5 and Gemini‑3.1‑Pro on professional cinematic techniques.

Multimodal AIQwen3-VLVideo Generation
0 likes · 11 min read
How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 7, 2026 · Artificial Intelligence

Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts

The article argues that while modern LLMs like ChatGPT and Gemini are powerful, their usage barriers are rising, and introduces ‘Pang Goose AI’, a low‑prompt AI agent that, through a pre‑built SOP system, can produce a one‑minute e‑commerce video or an interactive data‑dashboard with a single sentence, outperforming generic models and eliminating the need for users to master prompt engineering.

AI agentsAI product reviewSOP
0 likes · 12 min read
Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 4, 2026 · Artificial Intelligence

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

A review of CVPR 2026 papers shows a shift in visual generative AI from incremental performance gains within established frameworks to a systematic rewrite of default modeling assumptions, covering new guidance mechanisms, video generation architectures, direct image prediction, fine‑grained motion control, and dense semantic correspondence.

Video Generationdiffusiongenerative AI
0 likes · 13 min read
How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI
Machine Heart
Machine Heart
Apr 29, 2026 · Artificial Intelligence

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

VEGA-3D extracts the hidden 3D priors embedded in large video generation models, fuses them with semantic features via token‑level adaptive gating, and demonstrates dramatically higher multi‑view consistency and state‑of‑the‑art results on 3D scene‑understanding benchmarks such as ScanRefer, ScanQA, VSI‑Bench and LIBERO—all without any additional 3D annotations.

Embodied AIVEGA-3DVideo Generation
0 likes · 10 min read
VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding
Machine Heart
Machine Heart
Apr 27, 2026 · Artificial Intelligence

Why Traditional Video Captions Fail and How MTSS Solves the Problem

The article introduces Multi-Stream Scene Script (MTSS), a structured JSON‑based video description paradigm that replaces monolithic captions, explains its design principles, compares its advantages, and presents experimental evidence showing significant gains in both video understanding and generation tasks.

MTSSMultimodal AIVideo Generation
0 likes · 8 min read
Why Traditional Video Captions Fail and How MTSS Solves the Problem
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Open Generative AI is an open‑source, MIT‑licensed desktop suite that bundles over 200 cutting‑edge image, video, and lip‑sync models into four dedicated studios, offering unrestricted generation without content filters, subscription fees, or closed ecosystems, and provides online, desktop, and self‑hosted deployment options.

AI media generationImage GenerationMIT license
0 likes · 6 min read
Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation
Geek Labs
Geek Labs
Apr 23, 2026 · Artificial Intelligence

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

From the rapid rise of prompt‑engineering in 2025‑2026, this article reviews seven standout open‑source GitHub repositories—covering Nano Banana Pro, GPT‑Image‑2, multi‑model prompts, and video generation—detailing their stars, content structure, multilingual support, and ideal use cases for creators.

AI prompt engineeringGitHubImage Generation
0 likes · 14 min read
7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)
SuanNi
SuanNi
Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Diffusion ModelsEvaluation MetricsMultimodal AI
0 likes · 15 min read
Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights
Machine Heart
Machine Heart
Apr 12, 2026 · Artificial Intelligence

CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline

The CVPR 2026 WorldArena Challenge, organized by top academic institutions and Amap, introduces a new evaluation framework that tests video world models for physical realism and functional utility, while Amap releases its high‑performance ABot‑PhysWorld model and benchmark scores that set a new state‑of‑the‑art.

ABot-PhysWorldCVPR 2026Physical Consistency
0 likes · 9 min read
CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline
Machine Heart
Machine Heart
Apr 10, 2026 · Artificial Intelligence

OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative

The OneStory paper presented at CVPR 2026 introduces an adaptive‑memory framework for coherent multi‑shot video generation, reformulating the task as next‑shot generation and using Frame Selection and Adaptive Conditioner modules to maintain long‑range context while supporting both text‑to‑multi‑shot and image‑to‑multi‑shot synthesis.

Adaptive MemoryMulti-shot VideoOneStory
0 likes · 8 min read
OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative
SuanNi
SuanNi
Apr 8, 2026 · Industry Insights

How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings

An anonymous model, HappyHorse‑1.0, quickly topped the Artificial Analysis leaderboard for both text‑to‑video and image‑to‑video tracks, outscoring Seedance 2.0 by large margins and prompting intense community discussion about its origin, performance, and future stability.

AIArtificial IntelligenceCompetitive analysis
0 likes · 5 min read
How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings
Machine Heart
Machine Heart
Apr 4, 2026 · Artificial Intelligence

Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?

The article analyzes how AI video generation, after the launch of OpenAI's Sora, is moving from a focus on model performance to embedding video capabilities into existing platforms and business workflows, highlighting timeline shifts, key players, and emerging competitive criteria.

AI videoMarket TrendsOpenAI Sora
0 likes · 7 min read
Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?
SuanNi
SuanNi
Mar 25, 2026 · Industry Insights

Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation

OpenAI abruptly discontinued its Sora video‑generation app after a brief period of explosive popularity, revealing massive GPU costs, unsustainable pricing, fierce competition from rivals like Gemini and Claude, and a strategic pivot toward enterprise‑focused AI services.

AIOpenAIVideo Generation
0 likes · 10 min read
Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation
Amap Tech
Amap Tech
Mar 20, 2026 · Artificial Intelligence

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

ABot-PhysWorld introduces a physically consistent video generation framework for embodied AI, leveraging the PAI‑Bench benchmark, large‑scale multi‑modal data, DPO preference alignment, and dense action maps to surpass SOTA models in both visual quality and physical plausibility across diverse robotic tasks.

Deep LearningEmbodied AIPhysical Consistency
0 likes · 15 min read
How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation
Java Tech Enthusiast
Java Tech Enthusiast
Mar 7, 2026 · Artificial Intelligence

Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation

This article introduces several open‑source AI Skills—including Remotion, YouTube‑clipper, skill‑from‑masters, NotebookLM, Markdown‑to‑X publisher, and Anthropic's Agent Skills—detailing their purpose, core features, installation commands, and repository links for developers seeking automation solutions.

ClaudeDocument ProcessingVideo Generation
0 likes · 7 min read
Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation
Old Meng AI Explorer
Old Meng AI Explorer
Mar 5, 2026 · Industry Insights

Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching

This article introduces three high‑impact open‑source projects—tunnelto for instant public access to local services, Toonflow‑app for fully automated AI short‑video production from text, and cc‑switch for one‑click switching and unified configuration of multiple large‑model AI tools—highlighting their key features, cross‑platform support, and GitHub repositories.

AIVideo Generationdevelopment-tools
0 likes · 8 min read
Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching
HyperAI Super Neural
HyperAI Super Neural
Feb 14, 2026 · Artificial Intelligence

Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models

WorldArena introduces a unified benchmark that evaluates generated videos not only for visual fidelity but also for embodied task functionality across six dimensions, exposing a stark gap between visual realism and practical usefulness and providing a composite EWMScore to compare models.

Embodied AIEvaluation MetricsPhysical Consistency
0 likes · 9 min read
Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models
AI Engineering
AI Engineering
Feb 13, 2026 · Artificial Intelligence

ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090

ByteDance has open‑sourced the 12‑billion‑parameter video generation model Alive, which supports text‑to‑video/audio, image‑to‑video/audio, pure text‑to‑video and text‑to‑audio modes, runs on a 24 GB GPU, outperforms competitors in cross‑modal synchronization, and includes novel TA‑CrossAttn and UniTemp‑RoPE techniques.

Alive ModelByteDanceCross‑Modal Synchronization
0 likes · 5 min read
ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090
Bilibili Tech
Bilibili Tech
Jan 28, 2026 · Artificial Intelligence

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

This article examines the challenges of optimizing video generation model inference, moving from operator-level tweaks to full-graph compilation using torch.compile, and details systematic strategies to eliminate Graph Breaks, handle dynamic shapes, KV-Cache indexing, and Python-side caches, achieving a 47.6% speedup on a 14B model without accuracy loss.

AIInference AccelerationVideo Generation
0 likes · 14 min read
Boosting Video Generation Inference: Full Graph Compilation with torch.compile
Old Meng AI Explorer
Old Meng AI Explorer
Jan 27, 2026 · Artificial Intelligence

Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation

In the era of abundant AI utilities, this article highlights three recently popular open‑source projects—Spider_XHS for comprehensive Xiaohongshu data collection and automated posting, PPTAgent for one‑click, multi‑scene PowerPoint generation, and Code2Video for code‑driven, high‑quality video creation—detailing their core features, deployment steps, and GitHub links.

AI toolsPPT automationVideo Generation
0 likes · 7 min read
Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation
Design Hub
Design Hub
Jan 13, 2026 · Artificial Intelligence

Three AI-Powered Design Tools That Boost Creativity

The article reviews three open‑source AI tools—Claude Cowork for file‑based assistance, the LTX‑2 video generation model runnable on 8 GB GPUs via Pinokio, and SongGeneration Studio for end‑to‑end music creation—detailing their features, performance benchmarks, and usage steps for creators.

AI designClaudeLTX-2
0 likes · 8 min read
Three AI-Powered Design Tools That Boost Creativity
Kuaishou Tech
Kuaishou Tech
Jan 8, 2026 · Artificial Intelligence

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou secured 12 papers at AAAI 2026, covering advances in search and recommendation systems, multi‑camera video generation, multimodal understanding, generative model fundamentals, video large language models, experimental design, and LLM latent‑space reasoning, with three papers highlighted as oral presentations.

AILLMVideo Generation
0 likes · 22 min read
Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research
DataFunSummit
DataFunSummit
Dec 20, 2025 · Artificial Intelligence

How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications

This article details AutoHome's end‑to‑end development of the Cangjie large model, covering the training infrastructure with distributed data, pipeline and tensor parallelism, core business use cases such as video script generation and multi‑tool Agent capabilities, inference optimizations through quantization and fast serving frameworks, and future directions for personalized automotive AI services.

Agent AIDistributed TrainingVideo Generation
0 likes · 19 min read
How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications
HyperAI Super Neural
HyperAI Super Neural
Dec 12, 2025 · Artificial Intelligence

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

The AI Open‑Source Forum brought together researchers from Peking University, Tsinghua, Zilliz and MoonBit to share open‑source advances in audio‑synchronized video generation, vector database architecture, lightweight vision backbones, and an AI‑native programming language, highlighting datasets, system designs, and future collaborative directions.

AIAI‑Native ProgrammingVideo Generation
0 likes · 12 min read
AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language
Data Party THU
Data Party THU
Dec 9, 2025 · Artificial Intelligence

Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough

The GenMimic paper introduces a novel framework that enables humanoid robots to zero‑shot imitate human actions generated by AI video models, presenting a new dataset, a two‑stage 4D reconstruction pipeline, and a reinforcement‑learning strategy with weighted‑tracking and symmetry losses, validated in simulation and on a real 23‑DoF robot.

Humanoid RobotsReinforcement LearningRobotics
0 likes · 11 min read
Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough
Data Party THU
Data Party THU
Dec 2, 2025 · Artificial Intelligence

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

FFGo reveals that the first frame of text‑to‑video models acts as a conceptual memory buffer storing visual entities, and by using a few‑shot LoRA trained on only 20‑50 curated examples with a special transition prompt, it reliably activates multi‑object fusion, enabling high‑quality, controllable video customization without model architecture changes.

AI researchVideo Generationconceptual memory
0 likes · 9 min read
FFGo: Turning the First Frame into a Conceptual Memory for Video Customization
AI Frontier Lectures
AI Frontier Lectures
Nov 28, 2025 · Artificial Intelligence

Can AI Generate the Next Step in a Video? Inside the VANS Model

Researchers from Kuaishou and Hong Kong City University introduce VANS, a novel Video-as-Answer system that predicts and visualizes the next event in a video by jointly optimizing a visual language model and a video diffusion model, enabling personalized step‑by‑step guidance and future scenario generation.

Multimodal AIVideo Generationfuture prediction
0 likes · 10 min read
Can AI Generate the Next Step in a Video? Inside the VANS Model
HyperAI Super Neural
HyperAI Super Neural
Nov 25, 2025 · Artificial Intelligence

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

LongCat‑Video, an open‑source video generation model from Meituan, adopts a unified multi‑task architecture to handle text‑to‑video, image‑to‑video and video‑continuation, delivers minute‑long high‑quality clips with coarse‑to‑fine inference, achieves benchmark scores comparable to leading models like Wan2.2, and provides a one‑click deployment tutorial on HyperAI.

LongCat-VideoMeituanRLHF
0 likes · 6 min read
LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation
Kuaishou Tech
Kuaishou Tech
Nov 24, 2025 · Artificial Intelligence

How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained

This article details a new research pipeline that leverages large‑scale human preference data, a multi‑dimensional video reward model, and specialized alignment algorithms to dramatically improve video generation quality, motion fidelity, and text‑video consistency, with open‑source code and benchmarks for reproducibility.

AI AlignmentHuman FeedbackRLHF
0 likes · 10 min read
How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained
AI Large Model Application Practice
AI Large Model Application Practice
Nov 24, 2025 · Artificial Intelligence

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

This article breaks down the end‑to‑end engineering pipeline that converts a knowledge source such as a URL or PDF into a narrated PPT‑style video, detailing six core stages—from knowledge extraction and script generation to image creation, voice synthesis, and final video stitching—while highlighting practical model choices, prompt design, and stability tricks.

Artificial IntelligenceLLMMultimodal
0 likes · 16 min read
How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide
Wuming AI
Wuming AI
Oct 16, 2025 · Industry Insights

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

This week’s AI landscape saw Karpathy’s NanoChat open‑sourcing a 8‑K‑line ChatGPT replica, Ant Group unveiling a trillion‑parameter Ring‑1T model, Alibaba releasing the 4B/8B Qwen3‑VL visual language models that outperform Gemini 2.5 Flash Lite and GPT‑5 Nano, Google launching Veo 3.1 for high‑fidelity video generation, and Anthropic announcing Claude Haiku 4.5, a faster and cheaper LLM that excels on SWE‑bench benchmarks.

AI modelsLarge Language ModelsMultimodal
0 likes · 7 min read
Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5
Amap Tech
Amap Tech
Oct 3, 2025 · Artificial Intelligence

How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene

FantasyHSI introduces a graph‑based multi‑agent framework that combines visual‑language models and video‑generation diffusion to let digital humans perceive, plan, and interact autonomously in any 3D scene, producing physically plausible, long‑duration actions for animation creation and embodied‑AI simulation.

3D synthesisGraph ModelingReinforcement Learning
0 likes · 12 min read
How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene
Amap Tech
Amap Tech
Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingComputer VisionDiffusion Models
0 likes · 11 min read
How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Diffusion ModelsLatent DiffusionMultimodal AI
0 likes · 7 min read
Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality
Mashang Consumer UXC
Mashang Consumer UXC
Sep 29, 2025 · Artificial Intelligence

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

This article reviews the latest open‑source AI models released by major tech firms—including Tencent's 3D‑Omni and 3D‑Part, Shengshu Tech's Vidu Q2 for facial video, Nvidia's Audio2Face for real‑time facial animation, plus updates from Figma, Google, Alibaba and Kuaishou—highlighting their capabilities and potential applications in gaming, AR/VR, design and content creation.

3D modelingAIDeep Learning
0 likes · 8 min read
Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More
DataFunTalk
DataFunTalk
Sep 27, 2025 · Artificial Intelligence

How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films

The article explores how AI models like Seedream and Seedance are reshaping cinema, from AI‑driven short films showcased at the Busan Film Festival to full‑length feature productions, highlighting technical breakthroughs, industry perspectives, and the emerging "AI +" versus "+ AI" production paradigms.

AIAIGCCinema
0 likes · 11 min read
How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films
Kuaishou Tech
Kuaishou Tech
Sep 16, 2025 · Artificial Intelligence

How Kling-Avatar Generates Long, Emotionally Rich Digital Human Videos with Multimodal LLMs

Kuaishou's Kling-Avatar leverages a multimodal large‑language‑model‑driven two‑stage generation framework to produce minute‑long digital‑human videos that synchronize lip movements, facial expressions, and body gestures with audio, achieving high visual quality, identity consistency, and controllable storytelling across diverse scenarios.

AI AvatarDigital HumanMultimodal LLM
0 likes · 9 min read
How Kling-Avatar Generates Long, Emotionally Rich Digital Human Videos with Multimodal LLMs
DataFunTalk
DataFunTalk
Sep 11, 2025 · Artificial Intelligence

How AI Dressing and Multimodal Models Transform Home Service Experiences

During a pre-conference interview, AI expert Wang Mingzhong details how multimodal AI dressing, video résumé creation, short‑video templates, and interactive digital‑human live streams are technically realized for 58 Home Services, highlighting model training, workflow optimization, and future fusion of template‑based and agent‑driven video generation.

AIDigital HumanDomestic Service
0 likes · 11 min read
How AI Dressing and Multimodal Models Transform Home Service Experiences
Data STUDIO
Data STUDIO
Sep 3, 2025 · Artificial Intelligence

Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation

The author evaluates the free open‑source AiPy tool through three real‑world cases—U.S. GDP trend analysis, Cambricon stock assessment, and short‑video production—showing its local, code‑free workflow for data processing, report generation, and multimedia creation while noting minor visual glitches and their fixes.

AIAiPyTool Review
0 likes · 7 min read
Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation
Kuaishou Tech
Kuaishou Tech
Aug 25, 2025 · Artificial Intelligence

How Context-as-Memory Enables Scene‑Consistent Long Video Generation

This article introduces the Context-as-Memory approach, which treats previously generated video frames as memory to achieve scene‑consistent interactive long video generation, and details a camera‑trajectory‑based memory retrieval mechanism that dramatically improves efficiency and performance over existing state‑of‑the‑art methods.

AIVideo Generationcontext memory
0 likes · 7 min read
How Context-as-Memory Enables Scene‑Consistent Long Video Generation
Amap Tech
Amap Tech
Aug 18, 2025 · Artificial Intelligence

How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE

Omni-Effects introduces a unified framework that combines LoRA‑based expert mixture models and spatially aware prompts to generate multiple, precisely placed visual effects in video, supported by the new Omni‑VFX dataset and evaluation suite, demonstrating superior spatial control and diversity over prior single‑effect methods.

AILoRAVideo Generation
0 likes · 8 min read
How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE
AI Frontier Lectures
AI Frontier Lectures
Jul 30, 2025 · Artificial Intelligence

DualReal: Seamless Identity and Motion Customization for Video Generation

DualReal introduces a novel adaptive joint training framework that simultaneously customizes subject identity and motion dynamics in video generation, overcoming the conflicts of traditional isolated approaches by using a dual-domain perception adapter and stage-fusion controller, achieving up to 31.8% improvement on CLIP‑I and DINO‑I metrics.

Diffusion ModelsVideo Generationdual-domain adaptation
0 likes · 13 min read
DualReal: Seamless Identity and Motion Customization for Video Generation
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Diffusion ModelsImage RestorationVideo Generation
0 likes · 14 min read
VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchDiffusion ModelsImage Restoration
0 likes · 14 min read
Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS
Kuaishou Large Model
Kuaishou Large Model
Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsImage GenerationTest-Time Scaling
0 likes · 8 min read
How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsImage Generation
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
AntTech
AntTech
Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Computer VisionVideo Generation
0 likes · 20 min read
21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs
Kuaishou Tech
Kuaishou Tech
Jun 10, 2025 · Artificial Intelligence

Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025

The article highlights CVPR 2025’s acceptance statistics and showcases twelve cutting‑edge video‑generation papers from Kuaishou, spanning datasets, quality assessment, style control, scaling laws, 4D simulation, interleaved image‑text data, vision‑language acceleration, high‑fidelity avatars, patch‑wise super‑resolution, narrative‑driven benchmarks, sketch‑based editing, and spatio‑temporal diffusion, each with links and abstracts.

CVPR2025Computer VisionKuaishou
0 likes · 20 min read
Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025
DataFunTalk
DataFunTalk
Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsDiffusion Models
0 likes · 70 min read
Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches
AIWalker
AIWalker
May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Diffusion ModelsVideo Generationautoregressive modeling
0 likes · 16 min read
GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework
Baidu MEUX
Baidu MEUX
Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIImage GenerationVideo Generation
0 likes · 8 min read
Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelComputer VisionDeep Learning
0 likes · 5 min read
How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images
AIWalker
AIWalker
Mar 31, 2025 · Artificial Intelligence

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

VBench-2.0 expands the original VBench suite by introducing six fine‑grained dimensions—Human Fidelity, Controllability, Creativity, Physics, Commonsense, and more—to evaluate not only the visual quality of generated videos but also their intrinsic faithfulness to physical laws, common sense, and narrative coherence, providing open‑source tools, prompts, and human‑aligned metrics for the research community.

AI EvaluationIntrinsic FaithfulnessMultimodal
0 likes · 12 min read
VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation
AI Frontier Lectures
AI Frontier Lectures
Mar 30, 2025 · Artificial Intelligence

How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization

This article provides an in‑depth analysis of the NOVA model, a non‑quantized autoregressive video generation framework that combines frame‑by‑frame temporal prediction with set‑by‑set spatial prediction, uses diffusion loss for token estimation, and achieves state‑of‑the‑art results on multiple video and image benchmarks.

AI researchAutoregressive ModelNOVA
0 likes · 15 min read
How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization
AI Frontier Lectures
AI Frontier Lectures
Mar 14, 2025 · Artificial Intelligence

Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p

Open‑Sora 2.0, an open‑source 11‑billion‑parameter video generation model, delivers 720p 24 fps videos with visual quality and text‑image alignment comparable to proprietary systems like HunyuanVideo and Step‑Video, while cutting training costs to $200 k using only 224 GPUs, and the release includes full code, weights, and a Gradio demo.

3D autoencoderAIMMDiT
0 likes · 7 min read
Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p
NewBeeNLP
NewBeeNLP
Mar 14, 2025 · Artificial Intelligence

How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost

Open‑Sora 2.0 is an open‑source 11B‑parameter video generation model that matches commercial SOTA performance while being trained on 224 GPUs for just $200,000, thanks to a 3D auto‑encoder, MMDiT architecture, aggressive data filtering, low‑resolution pre‑training, and highly optimized parallel training techniques.

AI modelMMDiTOpen-Sora
0 likes · 9 min read
How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost
DataFunTalk
DataFunTalk
Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAHardware acceleration
0 likes · 11 min read
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025
DaTaobao Tech
DaTaobao Tech
Feb 24, 2025 · Artificial Intelligence

AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications

The article surveys recent AIGC video generation advances for Taobao e‑commerce, detailing lip‑sync models like Wav2Lip and MuseTalk, head‑driven systems such as Hallo and EchoMimic, body‑driven pipelines including AnimateAnyone and Tango, and a four‑stage production workflow that boosts click‑through rates and enables virtual try‑on.

AIGCDeep LearningMultimodal AI
0 likes · 21 min read
AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications
Infra Learning Club
Infra Learning Club
Feb 21, 2025 · Artificial Intelligence

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

This article introduces five open‑source AI tools—a PPT generator, an LLM app development platform, a cloud‑agnostic AI runner, a curated collection of LLM applications, and a one‑click HD video creator—detailing their key features, usage links, and sample configurations.

AIDifyLLM
0 likes · 8 min read
5 Must‑Try Open‑Source AI Projects You Can Start Using Today
AIWalker
AIWalker
Feb 13, 2025 · Artificial Intelligence

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

FlashVideo introduces a two‑stage framework that first generates low‑resolution videos with strong prompt fidelity and then uses flow‑matching ODE trajectories to upscale to 4K quality in just four function evaluations, achieving top VBench‑Long scores while cutting generation time by up to five‑fold.

AIFlashVideoVideo Generation
0 likes · 26 min read
How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute
AIWalker
AIWalker
Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Image GenerationLarge-Scale TrainingMultimodal AI
0 likes · 29 min read
Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation
AIWalker
AIWalker
Feb 10, 2025 · Artificial Intelligence

FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation

FlashVideo introduces a two‑stage diffusion framework that first ensures prompt fidelity at low resolution with a 5‑billion‑parameter DiT, then efficiently adds fine details at high resolution using flow matching, achieving state‑of‑the‑art quality with dramatically lower compute cost.

AIDiffusion ModelsFlashVideo
0 likes · 21 min read
FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation
ZhongAn Tech Team
ZhongAn Tech Team
Jan 19, 2025 · Artificial Intelligence

Weekly AI Digest Issue 11: Recommendation Algorithms, Video Generation Advances, and AGI Research

This issue of the weekly AI digest explores Xiaohongshu’s NoteLLM recommendation system, compares Chinese text generation in video AI across major platforms, highlights Alibaba’s Tongyi Wanxiang breakthroughs, discusses Keras founder François Chollet’s new AGI‑focused lab, and reviews Google’s Veo 2 and Imagen‑3 advancements.

AGIAIRecommendation Systems
0 likes · 11 min read
Weekly AI Digest Issue 11: Recommendation Algorithms, Video Generation Advances, and AGI Research
AIWalker
AIWalker
Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

Diffusion TransformerVideo Generationconditional adaptive normalization
0 likes · 16 min read
Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation
58UXD
58UXD
Dec 18, 2024 · Artificial Intelligence

Transform Your Designs with AI: 5 Steps to Create Stunning Videos

Learn how designers can harness AI tools in five practical steps—from script generation and AI‑driven image creation to video synthesis, music production, and final editing—to craft compelling, high‑quality videos that boost creativity and efficiency.

AI toolsAI videoVideo Generation
0 likes · 4 min read
Transform Your Designs with AI: 5 Steps to Create Stunning Videos
php Courses
php Courses
Dec 13, 2024 · Artificial Intelligence

OpenAI Releases Sora Video Generation Model: Three Key Implications and Core Features

OpenAI's new Sora model introduces AI-powered video generation, empowering creators, expanding interaction beyond text, and marking a pivotal step toward AGI by enabling machines to understand and produce visual content, with a suite of tools such as Explore, StoryBoard, Remix, Loop, and Blend.

Artificial IntelligenceOpenAISora
0 likes · 4 min read
OpenAI Releases Sora Video Generation Model: Three Key Implications and Core Features
Alipay Experience Technology
Alipay Experience Technology
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

EchoMimicV2 is an open‑source digital‑human framework that generates high‑quality half‑body animation videos from a single reference image, an audio clip, and a hand‑gesture sequence, addressing challenges of facial portrait limits, complex condition injection, and inference latency in audio‑driven animation.

AI researchDiffusion ModelsDigital Human
0 likes · 18 min read
EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 17, 2024 · Artificial Intelligence

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

Meta’s newly released 92‑page Movie Gen paper introduces a multimodal LLM that unifies text‑to‑image, text‑to‑video, personalized video, precise video editing, and audio generation, detailing its dual‑model architecture, training pipeline, temporal auto‑encoder design, scaling strategies, evaluation benchmark, and ablation studies.

Deep LearningModel ScalingMultimodal LLM
0 likes · 34 min read
How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights
DataFunSummit
DataFunSummit
Oct 10, 2024 · Artificial Intelligence

AIGC‑Assisted Marketing Material Generation at Shujia Technology

This article describes Shujia Technology's use of artificial intelligence to generate marketing images and videos, outlining the background, challenges of high-volume content production, detailed solutions for image and video assets—including layout models, diffusion models, and digital human synthesis—and future research directions.

AIGCDigital HumanImage Generation
0 likes · 12 min read
AIGC‑Assisted Marketing Material Generation at Shujia Technology
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

Computer VisionPixelLMStoryDiffusion
0 likes · 22 min read
How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation
360 Tech Engineering
360 Tech Engineering
Aug 29, 2024 · Artificial Intelligence

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

FancyVideo is an open‑source UNet‑based video generation model that supports arbitrary resolutions, aspect ratios, styles, and motion dynamics by introducing a Cross‑frame Textual Guidance Module (CTGM) with temporal injectors, refiners, and boosters, achieving state‑of‑the‑art results on multiple benchmarks and enabling versatile applications such as video extension, backtracking, and frame interpolation.

AI researchUNetVideo Generation
0 likes · 6 min read
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
Qunar Tech Salon
Qunar Tech Salon
Jul 25, 2024 · Artificial Intelligence

AI-Generated Video Practices for International Hotels

At the WOT2024 conference, Qunar Travel’s CTO Zheng Jimin presented a comprehensive overview of AI-generated video production for international hotels, detailing challenges, AI-driven workflow automation, practical implementation steps, multilingual translation enhancements, and performance results, offering valuable insights for scaling high‑quality hotel video content.

AIAIGCHotel Industry
0 likes · 11 min read
AI-Generated Video Practices for International Hotels
Baidu Geek Talk
Baidu Geek Talk
Jul 24, 2024 · Artificial Intelligence

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Li Yilin’s AI project blends Peking Opera characters with traditional ink‑wash painting by using PaddleHub for style transfer and PaddleGAN’s First‑Order Motion model for facial motion, then adds music and Wav2Lip lip‑sync, producing videos that modernize Chinese heritage and gauge public cultural awareness.

AIComputer VisionDeep Learning
0 likes · 9 min read
AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIComputer VisionDiffusion Transformer
0 likes · 5 min read
How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers
Kuaishou Large Model
Kuaishou Large Model
Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIComputer VisionDiffusion Models
0 likes · 9 min read
How I2V-Adapter Turns Images into Videos with Minimal Training
Kuaishou Tech
Kuaishou Tech
Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIComputer VisionDiffusion Models
0 likes · 8 min read
I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 19, 2024 · Artificial Intelligence

Deploy and Fine‑Tune EasyAnimate for High‑Res Video Generation on Alibaba Cloud PAI

EasyAnimate is Alibaba Cloud PAI's DiT video generation framework that provides a complete HD video generation solution, and this guide walks you through integrating EasyAnimate on PAI, setting up prerequisites, creating DSW instances, installing the model, performing inference via code or WebUI, fine‑tuning LoRA, and using the API.

Alibaba Cloud PAIDSWEasyAnimate
0 likes · 14 min read
Deploy and Fine‑Tune EasyAnimate for High‑Res Video Generation on Alibaba Cloud PAI
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 4, 2024 · Artificial Intelligence

EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers

EasyAnimate, an open‑source DiT‑based video generation framework from Alibaba Cloud AI Platform PAI, offers a complete pipeline—including data preprocessing, VAE and DiT training, LoRA fine‑tuning, motion‑module integration, and scalable inference up to 768×768 resolution and 144 frames—leveraging Diffusion Transformers to produce longer, higher‑quality videos.

AI videoDiffusion TransformerLoRA
0 likes · 14 min read
EasyAnimate: High‑Resolution Video Generation via Diffusion Transformers
JD Cloud Developers
JD Cloud Developers
May 14, 2024 · Artificial Intelligence

Create Digital Avatars and Face Swaps with EasyPhoto on JD Cloud

Learn how to install and use the EasyPhoto plugin on JD Cloud’s Stable Diffusion WebUI to generate digital avatars, perform multi‑person face swaps, and create AI‑generated videos, with step‑by‑step instructions, screenshots, and tips for optimal settings and coupon usage.

AI AvatarVideo Generationcloud-computing
0 likes · 6 min read
Create Digital Avatars and Face Swaps with EasyPhoto on JD Cloud
MoonWebTeam
MoonWebTeam
May 14, 2024 · Frontend Development

Top 9 Front-End & AI Trends Shaping 2024: From Apple’s MM1 to Micro‑Frontends

This monthly roundup highlights nine cutting‑edge topics—from Apple’s multimodal MM1 model and the Signals standardization proposal to Stable Video Diffusion, digital humans, micro‑frontend frameworks, Monkey testing automation, Tango low‑code sandbox, and cross‑platform app frameworks—offering deep insights and practical takeaways for modern developers.

Artificial IntelligenceMicro FrontendsVideo Generation
0 likes · 17 min read
Top 9 Front-End & AI Trends Shaping 2024: From Apple’s MM1 to Micro‑Frontends
DataFunTalk
DataFunTalk
May 3, 2024 · Artificial Intelligence

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

This article reviews the rapid progress of text‑to‑video generation, explains diffusion‑based video synthesis, outlines key technical challenges such as motion modeling, semantic alignment and quality, and presents Tencent’s solutions and real‑world applications, while also discussing future directions and the impact of OpenAI’s Sora model.

AIDiffusion ModelsSora
0 likes · 23 min read
Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora
Architect
Architect
Apr 16, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

This article dissects the possible architecture of OpenAI's Sora video model, tracing its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion backbone, long‑time consistency strategies, and training pipeline, while comparing alternatives such as MAGVIT‑v2, TECO, NaViT, and FDM to reveal why each design choice may have been made.

AI ArchitectureLatent DiffusionSora
0 likes · 51 min read
Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator
Alimama Tech
Alimama Tech
Apr 10, 2024 · Artificial Intelligence

SizeCube: AI‑Driven Arbitrary‑Size Image and Video Outpainting for Advertising

SizeCube leverages Stable Diffusion‑based diffusion models and a sophisticated pipeline—including quality filtering, feature mining, latent‑space UNet denoising, super‑resolution, and temporal 3D‑U‑Net video processing—to automatically outpaint images and videos to any size, boosting Alibaba advertisers’ creative flexibility, click‑through rates, and asset adaptability across diverse ad placements.

AIAdvertisingImage Outpainting
0 likes · 14 min read
SizeCube: AI‑Driven Arbitrary‑Size Image and Video Outpainting for Advertising
Architects' Tech Alliance
Architects' Tech Alliance
Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelMultimodalSora
0 likes · 6 min read
How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model
Architect
Architect
Mar 28, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

This article explains OpenAI's Sora video generation model, detailing its latent diffusion foundation, video compression network, spacetime patch representation, Diffusion Transformer processing, and decoding pipeline, while also reviewing related Stable Diffusion and Transformer concepts that enable high‑quality text‑to‑video synthesis.

AIDeep LearningLatent Diffusion
0 likes · 17 min read
Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies
DevOps
DevOps
Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

AIOpenAISora
0 likes · 8 min read
OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model
DaTaobao Tech
DaTaobao Tech
Mar 25, 2024 · Artificial Intelligence

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

AIAIGCDiffusion Models
0 likes · 13 min read
Survey of AIGC Video Generation Algorithms