Tagged articles

video generation

153 articles · Page 1 of 2
Data Party THU
Data Party THU
Jun 30, 2026 · Artificial Intelligence

Do Video Generation Models Really Reason? A 303‑Question Benchmark Exposes Their Reasoning Gaps

The article introduces the MME‑CoF‑Pro benchmark, which uses 303 carefully crafted video‑reasoning samples across 16 categories to evaluate seven leading video generation models, revealing that current models lack true reasoning ability, that prompting can both help and hurt coherence, and that the new Reasoning Score aligns well with human judgments.

Artificial IntelligenceBenchmarkEvaluation
0 likes · 11 min read
Do Video Generation Models Really Reason? A 303‑Question Benchmark Exposes Their Reasoning Gaps
IT Services Circle
IT Services Circle
Jun 28, 2026 · Artificial Intelligence

Top 10 Trending GitHub Open‑Source Projects This Week

This article reviews ten noteworthy GitHub open‑source projects released this week, covering AI‑driven website cloning, PDF manipulation, voice cloning, parallel agent orchestration, safe git pushes, AI resume scoring, agent‑native integration, AWS tooling, and automated video production, with key features, usage examples, and repository links.

AIAutomationGitHub
0 likes · 9 min read
Top 10 Trending GitHub Open‑Source Projects This Week
Machine Heart
Machine Heart
Jun 27, 2026 · Artificial Intelligence

Do Video Generation Models Really Reason? A 303‑Question Benchmark Exposes Their Reasoning Gaps

The paper introduces the Reasoning Coherence metric and the MME‑CoF‑Pro benchmark—303 image‑text‑video samples across 16 reasoning categories—to evaluate seven leading video generation models, revealing that reasoning ability is largely independent of visual quality, that textual prompts often induce hallucinations, and that the new Reasoning Score aligns well with human judgments.

AI evaluationBenchmarkMME-CoF-Pro
0 likes · 10 min read
Do Video Generation Models Really Reason? A 303‑Question Benchmark Exposes Their Reasoning Gaps
PaperAgent
PaperAgent
Jun 26, 2026 · Artificial Intelligence

13 Must-Read Agent Papers from Meituan for ICML'26

This article presents a curated list of thirteen recent research papers on generalist agents—covering visual memory, environment synthesis, value modeling, self‑verification, robustness benchmarks, high‑resolution video generation, long‑horizon world models, and alignment fine‑tuning—along with brief abstracts and links to the PDFs for the upcoming Meituan ICML'26 sharing sessions.

AIAgentBenchmark
0 likes · 16 min read
13 Must-Read Agent Papers from Meituan for ICML'26
Machine Heart
Machine Heart
Jun 25, 2026 · Artificial Intelligence

From Finding to Generating Videos: How Kuaishou’s RaG Transforms Recommendation Systems

Kuaishou’s new Recommendation-as-Generation (RaG) framework replaces traditional retrieve-and-rank with a generative pipeline that predicts user interests, creates personalized video content, and closes the loop with feedback, delivering a 1.87% ad‑revenue lift for over 400 million daily users.

A/B testingGenerative AILarge-Scale Deployment
0 likes · 14 min read
From Finding to Generating Videos: How Kuaishou’s RaG Transforms Recommendation Systems
Data Party THU
Data Party THU
Jun 21, 2026 · Artificial Intelligence

Lance: A Lightweight 3B Multimodal AI Model that Handles Vision, Video, Generation, and Editing

Lance, an open‑source 3‑billion‑parameter multimodal model from ByteDance, unifies image and video understanding, generation, and editing in a single architecture, achieves top scores on VBench (85.11), MVBench (62.0), GenEval (0.90) and GEdit‑Bench (7.30), and demonstrates emergent cross‑task generalization.

LanceMaPEMultimodal AI
0 likes · 9 min read
Lance: A Lightweight 3B Multimodal AI Model that Handles Vision, Video, Generation, and Editing
Machine Heart
Machine Heart
Jun 15, 2026 · Artificial Intelligence

How Close Is Video Generation to Being Beautiful, Useful, Accurate? 1080‑Prompt, 7‑Model KIVI Benchmark

Researchers introduce KIVI, a knowledge‑intensive video generation benchmark with 1080 real‑world prompts, evaluating seven models using new FactP and HelpS metrics, revealing systematic errors such as entity mis‑depiction, procedural mistakes, and component misplacement, and showing a gap between human‑crafted and AI‑generated videos.

BenchmarkFactPHelpS
0 likes · 9 min read
How Close Is Video Generation to Being Beautiful, Useful, Accurate? 1080‑Prompt, 7‑Model KIVI Benchmark
Top Architect
Top Architect
Jun 13, 2026 · Artificial Intelligence

Gemini Omni Review: Transform Sketches into Cinematic Videos with a Single Prompt

Google unveiled Gemini Omni, a new multimodal world model that combines reasoning and generation to create realistic videos, edit them conversationally, and demonstrate emergent abilities like style transfer and scene continuation, while introducing safety measures such as avatar registration and forced watermarks.

AI safetyGemini OmniMultimodal AI
0 likes · 10 min read
Gemini Omni Review: Transform Sketches into Cinematic Videos with a Single Prompt
Machine Heart
Machine Heart
Jun 11, 2026 · Artificial Intelligence

Agent‑Driven Newton Toolbox: A New Paradigm for Grounded Video Generation

NEWTON introduces an Agent‑centric framework that augments existing video generators with a planner, physics‑aware tools, and a verification loop, enabling multi‑round refinement and significantly improving physical consistency on benchmarks without retraining the underlying generator.

agentic AIbenchmark evaluationphysics grounding
0 likes · 8 min read
Agent‑Driven Newton Toolbox: A New Paradigm for Grounded Video Generation
Machine Heart
Machine Heart
Jun 11, 2026 · Artificial Intelligence

MBench: Tsinghua and Tencent Define Long-Term Memory for Video World Models

MBench, a new benchmark from Tsinghua University and Tencent, systematically evaluates the long‑term memory ability of streaming video generation models across entity, environment, and causal consistency, introduces a trigger‑conditioned scoring scheme, and reveals that memory remains a major bottleneck for current SOTA models.

AIBenchmarklong-term consistency
0 likes · 8 min read
MBench: Tsinghua and Tencent Define Long-Term Memory for Video World Models
Top Architect
Top Architect
Jun 10, 2026 · Artificial Intelligence

Gemini Omni Review: Transform Sketches into Cinematic Videos with a Single Prompt

Gemini Omni, Google DeepMind’s new multimodal world model, extends AI from text prediction to full‑scene video generation and editing, offering physics‑aware visuals, on‑the‑fly style transfer, digital avatars, and built‑in watermarks, while its training approach and emergent capabilities signal a step change toward AGI.

AI emergenceAI safetyGemini Omni
0 likes · 9 min read
Gemini Omni Review: Transform Sketches into Cinematic Videos with a Single Prompt
Top Architect
Top Architect
Jun 9, 2026 · Artificial Intelligence

Gemini Omni Unveiled: One Prompt Turns Sketches into Cinematic Videos

Google DeepMind’s Gemini Omni, announced at I/O, combines large‑language reasoning with multimodal generation to let users edit and create realistic videos by simply describing a change, while introducing digital avatars, layered training objectives, emergent capabilities, and built‑in safety watermarks.

AI emergenceGemini OmniGoogle DeepMind
0 likes · 10 min read
Gemini Omni Unveiled: One Prompt Turns Sketches into Cinematic Videos
Top Architect
Top Architect
Jun 8, 2026 · Artificial Intelligence

Gemini Omni Tested: One Prompt Turns Sketches into Cinematic Videos

Google’s Gemini Omni, unveiled at I/O, is a multimodal world model that combines reasoning and generation to enable conversational video editing, digital avatars, emergent style‑transfer and scene‑continuation capabilities, marking a step‑change from previous text‑to‑video systems like Veo.

AI video editingGemini OmniGoogle DeepMind
0 likes · 10 min read
Gemini Omni Tested: One Prompt Turns Sketches into Cinematic Videos
Top Architect
Top Architect
Jun 7, 2026 · Artificial Intelligence

Can Gemini Omni Turn Sketches into Blockbuster Videos with a Single Prompt?

Google unveiled Gemini Omni at I/O, a multimodal world model that combines reasoning and generation to produce realistic videos, edit them conversationally, create digital avatars, and demonstrate emergent abilities like style transfer and scene continuation, while also introducing safety measures such as forced watermarks.

AI emergenceAvatar FlowGemini Omni
0 likes · 10 min read
Can Gemini Omni Turn Sketches into Blockbuster Videos with a Single Prompt?
Top Architect
Top Architect
Jun 6, 2026 · Artificial Intelligence

How Gemini Omni Turns a Sketch into a Blockbuster Video with a Single Prompt

Gemini Omni, Google DeepMind’s new world model, combines multimodal reasoning and generation to enable conversational video editing, digital avatars, and emergent capabilities such as style transfer and scene continuation, while introducing safety measures like Avatar Flow and dual watermarks, marking a step toward true AI‑generated worlds.

AI emergent behaviorAI safetyGemini Omni
0 likes · 10 min read
How Gemini Omni Turns a Sketch into a Blockbuster Video with a Single Prompt
Top Architect
Top Architect
Jun 5, 2026 · Artificial Intelligence

Gemini Omni Turns Sketches into Blockbuster Videos with a Single Prompt

Google’s Gemini Omni, unveiled at I/O, is a multimodal world model that can generate realistic video, edit it conversationally, and understand physics, offering a step‑change over previous text‑to‑video systems and raising new safety and strategic questions for AI development.

AI safetyAI video editingGemini Omni
0 likes · 9 min read
Gemini Omni Turns Sketches into Blockbuster Videos with a Single Prompt
PaperAgent
PaperAgent
Jun 5, 2026 · Artificial Intelligence

Tongji’s “Boundless” World Model Wins Open‑Source #1 and Overall #2 in WorldArena

The Tongji University “Boundless” world model achieved the top open‑source score (64.54) and the second‑overall rank (67.87) on WorldArena’s Track‑1, demonstrating high‑quality video generation, stable long‑sequence physics, and embodied interaction across six evaluation dimensions, while using data‑efficient training and a hybrid open/closed‑source strategy.

BoundlessEmbodied AIOpen-source
0 likes · 9 min read
Tongji’s “Boundless” World Model Wins Open‑Source #1 and Overall #2 in WorldArena
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 4, 2026 · Artificial Intelligence

World Models Explained: A Comprehensive AI Overview and Technical Roadmap

This article provides a detailed, science‑level overview of world models, contrasting them with LLMs, defining their formalism, highlighting three core values (sample efficiency, planning, safety), tracing their 80‑year history, reviewing major architectures such as Dreamer, MuZero, STORM, Diamond, V‑JEPA 2 and DreamDojo, discussing current industry debates, and linking to an open‑source learning resource.

AI safetyDreamerMultimodal AI
0 likes · 24 min read
World Models Explained: A Comprehensive AI Overview and Technical Roadmap
Top Architect
Top Architect
Jun 4, 2026 · Artificial Intelligence

Testing Gemini Omni: Turn Sketches into Cinematic Videos with One Prompt

Google unveiled Gemini Omni at I/O, a multimodal world model that lets users edit videos by speaking a single sentence, turning simple sketches into cinematic clips, while offering conversational editing, digital‑twin avatars, emergent style‑transfer and scene‑continuation capabilities, all backed by a new multimodal training objective.

AI video editingGemini OmniGoogle DeepMind
0 likes · 10 min read
Testing Gemini Omni: Turn Sketches into Cinematic Videos with One Prompt
ShiZhen AI
ShiZhen AI
Jun 3, 2026 · Artificial Intelligence

Will Free Multimodal APIs Redefine AI Development Costs?

Agnes AI is offering its text, image, and video model APIs for unlimited free use, prompting a shift in AI application development where high‑frequency, multi‑step workflows—such as agents, content editing, and short‑video generation—can be prototyped and iterated without the token‑cost barriers that previously limited small teams.

Agent workflowFree APIMultimodal AI
0 likes · 16 min read
Will Free Multimodal APIs Redefine AI Development Costs?
Top Architect
Top Architect
Jun 1, 2026 · Artificial Intelligence

Gemini Omni Review: Turn Sketches into Cinematic Videos with a Single Prompt

Google DeepMind's Gemini Omni introduces a multimodal world model that can generate realistic video, edit it conversationally, and demonstrate emergent capabilities such as style transfer and scene continuation, marking a step‑change in AI video technology.

AI emergenceGemini OmniGoogle DeepMind
0 likes · 11 min read
Gemini Omni Review: Turn Sketches into Cinematic Videos with a Single Prompt
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 29, 2026 · Artificial Intelligence

WBench: 20 Cutting‑Edge World Models Face a Comprehensive Interactive Benchmark

WBench, a new benchmark created by Meituan LongCat and Fudan University, evaluates 20 state‑of‑the‑art video and world‑model systems across 289 test cases and 1,058 interaction rounds, measuring video quality, setting adherence, interaction fidelity, consistency and physical compliance, and reveals that no model yet excels in all five dimensions.

Interactive BenchmarkMultimodal EvaluationWBench
0 likes · 10 min read
WBench: 20 Cutting‑Edge World Models Face a Comprehensive Interactive Benchmark
SuanNi
SuanNi
May 24, 2026 · Artificial Intelligence

Meituan’s Open‑Source Digital Human Model Delivers Real‑World Performance Across MV, E‑Commerce, and More

Meituan’s LongCat‑Video‑Avatar 1.5 replaces its audio encoder with Whisper‑Large, cuts inference to eight steps, and, after a 770‑person, 13,240‑rating evaluation, outperforms competing models in lip‑sync, style generalization, multi‑person scenes, and overall visual fidelity.

AIBenchmarkLongCat-Video-Avatar
0 likes · 7 min read
Meituan’s Open‑Source Digital Human Model Delivers Real‑World Performance Across MV, E‑Commerce, and More
Meituan Technology Team
Meituan Technology Team
May 22, 2026 · Artificial Intelligence

From High-Fidelity to Real-World Use: LongCat Video Avatar 1.5 Open‑Source Release

LongCat Video Avatar 1.5 is now open‑source, delivering commercial‑grade lip sync, physical realism, long‑video stability, multi‑person interaction and 15× faster inference through Whisper‑large audio encoding, DMD 8‑step distillation and LoRA adapters, and it outperforms leading closed‑source models in extensive human‑rated benchmarks.

AIBenchmarkDistillation
0 likes · 9 min read
From High-Fidelity to Real-World Use: LongCat Video Avatar 1.5 Open‑Source Release
SuanNi
SuanNi
May 22, 2026 · Artificial Intelligence

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Lance, ByteDance’s newly open‑sourced 3‑billion‑parameter multimodal model, runs on a single 40 GB GPU, tops HuggingFace trend charts, and achieves leading scores on DPG Bench, GenEval, and video generation benchmarks while surpassing several state‑of‑the‑art single‑modal models.

AI researchByteDanceLance
0 likes · 3 min read
All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance
Machine Heart
Machine Heart
May 20, 2026 · Artificial Intelligence

How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning

The VChain framework injects multimodal large‑model reasoning into video generation, using a three‑stage visual‑thought pipeline, sparse inference‑time adaptation, and guided sampling to produce physically consistent, logically coherent videos, as demonstrated by qualitative and quantitative experiments.

Multimodal Large ModelsSparse Fine‑tuningVisual Reasoning
0 likes · 8 min read
How VChain Gives Video Generation a Visual Thought Chain for Explicit Spatiotemporal Planning
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

What Exactly Is a World Model? History, Technology, and the $10 B Bet

The article traces the two decades‑long, parallel research lines that birthed video world models—dreaming agents in reinforcement learning and learning physics from human video—explains how they converged in 2024‑2025, evaluates current capabilities and limitations, and analyzes the $10 billion investment landscape and strategic moves by NVIDIA, OpenAI, and others.

AI researchSimulationreinforcement learning
0 likes · 32 min read
What Exactly Is a World Model? History, Technology, and the $10 B Bet
Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding

The CHAI framework introduced by CMU and Harvard defines a structured video‑language annotation scheme, scalable human‑AI oversight, and a post‑training pipeline that enables an 8B open‑source model to outperform closed‑source GPT‑5 and Gemini‑3.1‑Pro on professional cinematic techniques.

AnnotationMultimodal AIQwen3-VL
0 likes · 11 min read
How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 7, 2026 · Artificial Intelligence

Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts

The article argues that while modern LLMs like ChatGPT and Gemini are powerful, their usage barriers are rising, and introduces ‘Pang Goose AI’, a low‑prompt AI agent that, through a pre‑built SOP system, can produce a one‑minute e‑commerce video or an interactive data‑dashboard with a single sentence, outperforming generic models and eliminating the need for users to master prompt engineering.

AI agentsAI product reviewSOP
0 likes · 12 min read
Low‑Prompt ‘Pang Goose AI’ Lets Anyone Generate Videos and Dashboards Without Learning Complex Prompts
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 4, 2026 · Artificial Intelligence

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

A review of CVPR 2026 papers shows a shift in visual generative AI from incremental performance gains within established frameworks to a systematic rewrite of default modeling assumptions, covering new guidance mechanisms, video generation architectures, direct image prediction, fine‑grained motion control, and dense semantic correspondence.

Generative AIdiffusionhuman motion
0 likes · 13 min read
How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI
Machine Heart
Machine Heart
Apr 29, 2026 · Artificial Intelligence

VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding

VEGA-3D extracts the hidden 3D priors embedded in large video generation models, fuses them with semantic features via token‑level adaptive gating, and demonstrates dramatically higher multi‑view consistency and state‑of‑the‑art results on 3D scene‑understanding benchmarks such as ScanRefer, ScanQA, VSI‑Bench and LIBERO—all without any additional 3D annotations.

Embodied AIScene UnderstandingVEGA-3D
0 likes · 10 min read
VEGA-3D: Unleashing Implicit 3D Priors in Video Generation for Scene Understanding
Machine Heart
Machine Heart
Apr 27, 2026 · Artificial Intelligence

Why Traditional Video Captions Fail and How MTSS Solves the Problem

The article introduces Multi-Stream Scene Script (MTSS), a structured JSON‑based video description paradigm that replaces monolithic captions, explains its design principles, compares its advantages, and presents experimental evidence showing significant gains in both video understanding and generation tasks.

MTSSMultimodal AIstructured video description
0 likes · 8 min read
Why Traditional Video Captions Fail and How MTSS Solves the Problem
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation

Open Generative AI is an open‑source, MIT‑licensed desktop suite that bundles over 200 cutting‑edge image, video, and lip‑sync models into four dedicated studios, offering unrestricted generation without content filters, subscription fees, or closed ecosystems, and provides online, desktop, and self‑hosted deployment options.

AI media generationMIT licenseOpen Generative AI
0 likes · 6 min read
Open Generative AI: 200+ Open‑Source Models for Image, Video, and Lip‑Sync Creation
Geek Labs
Geek Labs
Apr 23, 2026 · Artificial Intelligence

7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)

From the rapid rise of prompt‑engineering in 2025‑2026, this article reviews seven standout open‑source GitHub repositories—covering Nano Banana Pro, GPT‑Image‑2, multi‑model prompts, and video generation—detailing their stars, content structure, multilingual support, and ideal use cases for creators.

AI Prompt EngineeringGitHubNano Banana Pro
0 likes · 14 min read
7 Must‑Watch Open‑Source Prompt Libraries for AI Image and Video Generation (2025‑2026)
SuanNi
SuanNi
Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Diffusion ModelsEvaluation MetricsMultimodal AI
0 likes · 15 min read
Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights
Machine Heart
Machine Heart
Apr 12, 2026 · Artificial Intelligence

CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline

The CVPR 2026 WorldArena Challenge, organized by top academic institutions and Amap, introduces a new evaluation framework that tests video world models for physical realism and functional utility, while Amap releases its high‑performance ABot‑PhysWorld model and benchmark scores that set a new state‑of‑the‑art.

ABot-PhysWorldBenchmarkCVPR 2026
0 likes · 9 min read
CVPR 2026 WorldArena Challenge Launches with Amap’s Open‑Source High‑Performance World Model Baseline
Machine Heart
Machine Heart
Apr 10, 2026 · Artificial Intelligence

OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative

The OneStory paper presented at CVPR 2026 introduces an adaptive‑memory framework for coherent multi‑shot video generation, reformulating the task as next‑shot generation and using Frame Selection and Adaptive Conditioner modules to maintain long‑range context while supporting both text‑to‑multi‑shot and image‑to‑multi‑shot synthesis.

Adaptive MemoryMulti-shot VideoOneStory
0 likes · 8 min read
OneStory Enables Minute-Long, Ten-Shot Video Generation with Consistent Narrative
SuanNi
SuanNi
Apr 8, 2026 · Industry Insights

How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings

An anonymous model, HappyHorse‑1.0, quickly topped the Artificial Analysis leaderboard for both text‑to‑video and image‑to‑video tracks, outscoring Seedance 2.0 by large margins and prompting intense community discussion about its origin, performance, and future stability.

AIArtificial IntelligenceCompetitive Analysis
0 likes · 5 min read
How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings
Machine Heart
Machine Heart
Apr 4, 2026 · Artificial Intelligence

Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?

The article analyzes how AI video generation, after the launch of OpenAI's Sora, is moving from a focus on model performance to embedding video capabilities into existing platforms and business workflows, highlighting timeline shifts, key players, and emerging competitive criteria.

AI videoGenerative AIMarket Trends
0 likes · 7 min read
Is AI Video Generation Shifting From Model Showcases to Integrated Workflows?
SuanNi
SuanNi
Mar 25, 2026 · Industry Insights

Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation

OpenAI abruptly discontinued its Sora video‑generation app after a brief period of explosive popularity, revealing massive GPU costs, unsustainable pricing, fierce competition from rivals like Gemini and Claude, and a strategic pivot toward enterprise‑focused AI services.

AIMarket AnalysisOpenAI
0 likes · 10 min read
Why OpenAI Shut Down Sora: The Costly Rise and Fall of AI Video Generation
Amap Tech
Amap Tech
Mar 20, 2026 · Artificial Intelligence

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

ABot-PhysWorld introduces a physically consistent video generation framework for embodied AI, leveraging the PAI‑Bench benchmark, large‑scale multi‑modal data, DPO preference alignment, and dense action maps to surpass SOTA models in both visual quality and physical plausibility across diverse robotic tasks.

BenchmarkDeep LearningEmbodied AI
0 likes · 15 min read
How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation
Java Tech Enthusiast
Java Tech Enthusiast
Mar 7, 2026 · Artificial Intelligence

Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation

This article introduces several open‑source AI Skills—including Remotion, YouTube‑clipper, skill‑from‑masters, NotebookLM, Markdown‑to‑X publisher, and Anthropic's Agent Skills—detailing their purpose, core features, installation commands, and repository links for developers seeking automation solutions.

ClaudeDocument processingOpen-source
0 likes · 7 min read
Explore Cutting‑Edge Open‑Source AI Skills for Video, Docs, and Social Media Automation
Old Meng AI Explorer
Old Meng AI Explorer
Mar 5, 2026 · Industry Insights

Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching

This article introduces three high‑impact open‑source projects—tunnelto for instant public access to local services, Toonflow‑app for fully automated AI short‑video production from text, and cc‑switch for one‑click switching and unified configuration of multiple large‑model AI tools—highlighting their key features, cross‑platform support, and GitHub repositories.

AIdevelopment toolslocal tunneling
0 likes · 8 min read
Three Must‑Try Open‑Source Tools: Local Tunneling, AI Short‑Video Creation, and Multi‑Model Switching
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Feb 16, 2026 · Artificial Intelligence

Three Years of AI Evolution: From Incremental Gains to Unlimited Capability Frontiers

The article analyzes how, over the past three years, rapid growth in compute, data, and model architecture has turned incremental advances in large language models into qualitative leaps—spanning emergent abilities, world‑model video generation, and agentic AI—suggesting an effectively unbounded frontier for AI capabilities.

AI agentsAI capability boundariesLarge Language Models
0 likes · 18 min read
Three Years of AI Evolution: From Incremental Gains to Unlimited Capability Frontiers
HyperAI Super Neural
HyperAI Super Neural
Feb 14, 2026 · Artificial Intelligence

Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models

WorldArena introduces a unified benchmark that evaluates generated videos not only for visual fidelity but also for embodied task functionality across six dimensions, exposing a stark gap between visual realism and practical usefulness and providing a composite EWMScore to compare models.

BenchmarkEmbodied AIEvaluation Metrics
0 likes · 9 min read
Beyond Visual Realism: WorldArena Benchmark Reveals the Capability Gap in Embodied World Models
AI Engineering
AI Engineering
Feb 13, 2026 · Artificial Intelligence

ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090

ByteDance has open‑sourced the 12‑billion‑parameter video generation model Alive, which supports text‑to‑video/audio, image‑to‑video/audio, pure text‑to‑video and text‑to‑audio modes, runs on a 24 GB GPU, outperforms competitors in cross‑modal synchronization, and includes novel TA‑CrossAttn and UniTemp‑RoPE techniques.

Alive ModelByteDanceCross‑Modal Synchronization
0 likes · 5 min read
ByteDance’s Open‑Source 12B‑Parameter Video Model “Alive” Runs on a Single RTX 3090/4090
Bilibili Tech
Bilibili Tech
Jan 28, 2026 · Artificial Intelligence

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

This article examines the challenges of optimizing video generation model inference, moving from operator-level tweaks to full-graph compilation using torch.compile, and details systematic strategies to eliminate Graph Breaks, handle dynamic shapes, KV-Cache indexing, and Python-side caches, achieving a 47.6% speedup on a 14B model without accuracy loss.

AIgraph optimizationinference acceleration
0 likes · 14 min read
Boosting Video Generation Inference: Full Graph Compilation with torch.compile
Old Meng AI Explorer
Old Meng AI Explorer
Jan 27, 2026 · Artificial Intelligence

Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation

In the era of abundant AI utilities, this article highlights three recently popular open‑source projects—Spider_XHS for comprehensive Xiaohongshu data collection and automated posting, PPTAgent for one‑click, multi‑scene PowerPoint generation, and Code2Video for code‑driven, high‑quality video creation—detailing their core features, deployment steps, and GitHub links.

AI toolsData ScrapingOpen-source
0 likes · 7 min read
Three Must‑Try Open‑Source AI Tools for Data Mining, PPT Creation, and Video Generation
Design Hub
Design Hub
Jan 13, 2026 · Artificial Intelligence

Three AI-Powered Design Tools That Boost Creativity

The article reviews three open‑source AI tools—Claude Cowork for file‑based assistance, the LTX‑2 video generation model runnable on 8 GB GPUs via Pinokio, and SongGeneration Studio for end‑to‑end music creation—detailing their features, performance benchmarks, and usage steps for creators.

AI designClaudeLTX-2
0 likes · 8 min read
Three AI-Powered Design Tools That Boost Creativity
Kuaishou Tech
Kuaishou Tech
Jan 8, 2026 · Artificial Intelligence

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou secured 12 papers at AAAI 2026, covering advances in search and recommendation systems, multi‑camera video generation, multimodal understanding, generative model fundamentals, video large language models, experimental design, and LLM latent‑space reasoning, with three papers highlighted as oral presentations.

AILLMdiffusion
0 likes · 22 min read
Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research
DataFunSummit
DataFunSummit
Dec 20, 2025 · Artificial Intelligence

How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications

This article details AutoHome's end‑to‑end development of the Cangjie large model, covering the training infrastructure with distributed data, pipeline and tensor parallelism, core business use cases such as video script generation and multi‑tool Agent capabilities, inference optimizations through quantization and fast serving frameworks, and future directions for personalized automotive AI services.

Agent AIQuantizationdistributed training
0 likes · 19 min read
How AutoHome Built the Cangjie Large Model: From Training Architecture to Real-World AI Applications
HyperAI Super Neural
HyperAI Super Neural
Dec 12, 2025 · Artificial Intelligence

AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language

The AI Open‑Source Forum brought together researchers from Peking University, Tsinghua, Zilliz and MoonBit to share open‑source advances in audio‑synchronized video generation, vector database architecture, lightweight vision backbones, and an AI‑native programming language, highlighting datasets, system designs, and future collaborative directions.

AIAI‑Native ProgrammingOpen-source
0 likes · 12 min read
AI Open‑Source Forum Recap: Video Generation, Vision, Vector DBs, AI‑Native Language
Data Party THU
Data Party THU
Dec 9, 2025 · Artificial Intelligence

Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough

The GenMimic paper introduces a novel framework that enables humanoid robots to zero‑shot imitate human actions generated by AI video models, presenting a new dataset, a two‑stage 4D reconstruction pipeline, and a reinforcement‑learning strategy with weighted‑tracking and symmetry losses, validated in simulation and on a real 23‑DoF robot.

humanoid robotsreinforcement learningrobotics
0 likes · 11 min read
Can Robots Learn Human Moves Directly from AI‑Generated Videos? The GenMimic Breakthrough
Data Party THU
Data Party THU
Dec 2, 2025 · Artificial Intelligence

FFGo: Turning the First Frame into a Conceptual Memory for Video Customization

FFGo reveals that the first frame of text‑to‑video models acts as a conceptual memory buffer storing visual entities, and by using a few‑shot LoRA trained on only 20‑50 curated examples with a special transition prompt, it reliably activates multi‑object fusion, enabling high‑quality, controllable video customization without model architecture changes.

AI researchconceptual memoryfew-shot LoRA
0 likes · 9 min read
FFGo: Turning the First Frame into a Conceptual Memory for Video Customization
AI Frontier Lectures
AI Frontier Lectures
Nov 28, 2025 · Artificial Intelligence

Can AI Generate the Next Step in a Video? Inside the VANS Model

Researchers from Kuaishou and Hong Kong City University introduce VANS, a novel Video-as-Answer system that predicts and visualizes the next event in a video by jointly optimizing a visual language model and a video diffusion model, enabling personalized step‑by‑step guidance and future scenario generation.

Multimodal AIfuture predictionjoint optimization
0 likes · 10 min read
Can AI Generate the Next Step in a Video? Inside the VANS Model
HyperAI Super Neural
HyperAI Super Neural
Nov 25, 2025 · Artificial Intelligence

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

LongCat‑Video, an open‑source video generation model from Meituan, adopts a unified multi‑task architecture to handle text‑to‑video, image‑to‑video and video‑continuation, delivers minute‑long high‑quality clips with coarse‑to‑fine inference, achieves benchmark scores comparable to leading models like Wan2.2, and provides a one‑click deployment tutorial on HyperAI.

BenchmarkLongCat-VideoMeituan
0 likes · 6 min read
LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation
Kuaishou Tech
Kuaishou Tech
Nov 24, 2025 · Artificial Intelligence

How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained

This article details a new research pipeline that leverages large‑scale human preference data, a multi‑dimensional video reward model, and specialized alignment algorithms to dramatically improve video generation quality, motion fidelity, and text‑video consistency, with open‑source code and benchmarks for reproducibility.

AI alignmentBenchmarkHuman Feedback
0 likes · 10 min read
How Human Feedback Supercharges Video Generation – The VideoAlign Pipeline Explained
AI Large Model Application Practice
AI Large Model Application Practice
Nov 24, 2025 · Artificial Intelligence

How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide

This article breaks down the end‑to‑end engineering pipeline that converts a knowledge source such as a URL or PDF into a narrated PPT‑style video, detailing six core stages—from knowledge extraction and script generation to image creation, voice synthesis, and final video stitching—while highlighting practical model choices, prompt design, and stability tricks.

Artificial IntelligenceLLMMultimodal
0 likes · 16 min read
How to Turn Text into an AI‑Powered PPT Video: A Step‑by‑Step Guide
Wuming AI
Wuming AI
Oct 16, 2025 · Industry Insights

Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5

This week’s AI landscape saw Karpathy’s NanoChat open‑sourcing a 8‑K‑line ChatGPT replica, Ant Group unveiling a trillion‑parameter Ring‑1T model, Alibaba releasing the 4B/8B Qwen3‑VL visual language models that outperform Gemini 2.5 Flash Lite and GPT‑5 Nano, Google launching Veo 3.1 for high‑fidelity video generation, and Anthropic announcing Claude Haiku 4.5, a faster and cheaper LLM that excels on SWE‑bench benchmarks.

AI modelsLarge Language ModelsMultimodal
0 likes · 7 min read
Top AI Model Releases This Week: NanoChat, Ring‑1T, Qwen3‑VL, Veo 3.1, Claude Haiku 4.5
Amap Tech
Amap Tech
Oct 3, 2025 · Artificial Intelligence

How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene

FantasyHSI introduces a graph‑based multi‑agent framework that combines visual‑language models and video‑generation diffusion to let digital humans perceive, plan, and interact autonomously in any 3D scene, producing physically plausible, long‑duration actions for animation creation and embodied‑AI simulation.

3D synthesisGraph Modelinghuman-scene interaction
0 likes · 12 min read
How FantasyHSI Enables Autonomous 3D Human Interaction in Any Scene
Amap Tech
Amap Tech
Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingDiffusion ModelsMultimodal AI
0 likes · 11 min read
How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Diffusion ModelsMultimodal AIStable Diffusion
0 likes · 7 min read
Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality
Mashang Consumer UXC
Mashang Consumer UXC
Sep 29, 2025 · Artificial Intelligence

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

This article reviews the latest open‑source AI models released by major tech firms—including Tencent's 3D‑Omni and 3D‑Part, Shengshu Tech's Vidu Q2 for facial video, Nvidia's Audio2Face for real‑time facial animation, plus updates from Figma, Google, Alibaba and Kuaishou—highlighting their capabilities and potential applications in gaming, AR/VR, design and content creation.

3D modelingAIDeep Learning
0 likes · 8 min read
Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More
DataFunTalk
DataFunTalk
Sep 27, 2025 · Artificial Intelligence

How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films

The article explores how AI models like Seedream and Seedance are reshaping cinema, from AI‑driven short films showcased at the Busan Film Festival to full‑length feature productions, highlighting technical breakthroughs, industry perspectives, and the emerging "AI +" versus "+ AI" production paradigms.

AIAIGCCinema
0 likes · 11 min read
How AI Is Redefining Filmmaking: From Festival Shorts to Feature Films
Kuaishou Tech
Kuaishou Tech
Sep 16, 2025 · Artificial Intelligence

How Kling-Avatar Generates Long, Emotionally Rich Digital Human Videos with Multimodal LLMs

Kuaishou's Kling-Avatar leverages a multimodal large‑language‑model‑driven two‑stage generation framework to produce minute‑long digital‑human videos that synchronize lip movements, facial expressions, and body gestures with audio, achieving high visual quality, identity consistency, and controllable storytelling across diverse scenarios.

AI Avatardigital humanlong video synthesis
0 likes · 9 min read
How Kling-Avatar Generates Long, Emotionally Rich Digital Human Videos with Multimodal LLMs
DataFunTalk
DataFunTalk
Sep 11, 2025 · Artificial Intelligence

How AI Dressing and Multimodal Models Transform Home Service Experiences

During a pre-conference interview, AI expert Wang Mingzhong details how multimodal AI dressing, video résumé creation, short‑video templates, and interactive digital‑human live streams are technically realized for 58 Home Services, highlighting model training, workflow optimization, and future fusion of template‑based and agent‑driven video generation.

AIDomestic ServiceMultimodal
0 likes · 11 min read
How AI Dressing and Multimodal Models Transform Home Service Experiences
Data STUDIO
Data STUDIO
Sep 3, 2025 · Artificial Intelligence

Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation

The author evaluates the free open‑source AiPy tool through three real‑world cases—U.S. GDP trend analysis, Cambricon stock assessment, and short‑video production—showing its local, code‑free workflow for data processing, report generation, and multimedia creation while noting minor visual glitches and their fixes.

AIAiPyTool Review
0 likes · 7 min read
Hands‑On Review of AiPy: Open‑Source AI‑Python Tool for Data Analysis, Reporting & Video Creation
Kuaishou Tech
Kuaishou Tech
Aug 25, 2025 · Artificial Intelligence

How Context-as-Memory Enables Scene‑Consistent Long Video Generation

This article introduces the Context-as-Memory approach, which treats previously generated video frames as memory to achieve scene‑consistent interactive long video generation, and details a camera‑trajectory‑based memory retrieval mechanism that dramatically improves efficiency and performance over existing state‑of‑the‑art methods.

AIMemory Retrievalcontext memory
0 likes · 7 min read
How Context-as-Memory Enables Scene‑Consistent Long Video Generation
Amap Tech
Amap Tech
Aug 18, 2025 · Artificial Intelligence

How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE

Omni-Effects introduces a unified framework that combines LoRA‑based expert mixture models and spatially aware prompts to generate multiple, precisely placed visual effects in video, supported by the new Omni‑VFX dataset and evaluation suite, demonstrating superior spatial control and diversity over prior single‑effect methods.

AILoRAcomputer graphics
0 likes · 8 min read
How Omni-Effects Enables Spatially Controllable Multi‑VFX Generation with LoRA‑MoE
AI Frontier Lectures
AI Frontier Lectures
Jul 30, 2025 · Artificial Intelligence

DualReal: Seamless Identity and Motion Customization for Video Generation

DualReal introduces a novel adaptive joint training framework that simultaneously customizes subject identity and motion dynamics in video generation, overcoming the conflicts of traditional isolated approaches by using a dual-domain perception adapter and stage-fusion controller, achieving up to 31.8% improvement on CLIP‑I and DINO‑I metrics.

Diffusion Modelsdual-domain adaptationidentity preservation
0 likes · 13 min read
DualReal: Seamless Identity and Motion Customization for Video Generation
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

BenchmarkDiffusion Modelsimage restoration
0 likes · 14 min read
VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchDiffusion Modelsimage restoration
0 likes · 14 min read
Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS
Kuaishou Large Model
Kuaishou Large Model
Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsEvolutionary Searchimage generation
0 likes · 8 min read
How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsEvolutionary Search
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
AntTech
AntTech
Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Generative AIcomputer vision
0 likes · 20 min read
21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs
Kuaishou Tech
Kuaishou Tech
Jun 10, 2025 · Artificial Intelligence

Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025

The article highlights CVPR 2025’s acceptance statistics and showcases twelve cutting‑edge video‑generation papers from Kuaishou, spanning datasets, quality assessment, style control, scaling laws, 4D simulation, interleaved image‑text data, vision‑language acceleration, high‑fidelity avatars, patch‑wise super‑resolution, narrative‑driven benchmarks, sketch‑based editing, and spatio‑temporal diffusion, each with links and abstracts.

CVPR2025KuaishouMultimodal AI
0 likes · 20 min read
Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025
DataFunTalk
DataFunTalk
Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsDiffusion Models
0 likes · 70 min read
Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches
AIWalker
AIWalker
May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Diffusion Modelsautoregressive modelingcausal attention
0 likes · 16 min read
GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework
Baidu MEUX
Baidu MEUX
Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIdigital humansimage generation
0 likes · 8 min read
Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelDeep LearningOpen-source
0 likes · 5 min read
How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images
AIWalker
AIWalker
Mar 31, 2025 · Artificial Intelligence

VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation

VBench-2.0 expands the original VBench suite by introducing six fine‑grained dimensions—Human Fidelity, Controllability, Creativity, Physics, Commonsense, and more—to evaluate not only the visual quality of generated videos but also their intrinsic faithfulness to physical laws, common sense, and narrative coherence, providing open‑source tools, prompts, and human‑aligned metrics for the research community.

AI evaluationBenchmarkIntrinsic Faithfulness
0 likes · 12 min read
VBench-2.0: A Next‑Generation Benchmark for Intrinsic Faithfulness in AI Video Generation
AI Frontier Lectures
AI Frontier Lectures
Mar 30, 2025 · Artificial Intelligence

How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization

This article provides an in‑depth analysis of the NOVA model, a non‑quantized autoregressive video generation framework that combines frame‑by‑frame temporal prediction with set‑by‑set spatial prediction, uses diffusion loss for token estimation, and achieves state‑of‑the‑art results on multiple video and image benchmarks.

AI researchAutoregressive ModelNova
0 likes · 15 min read
How NOVA Generates High‑Quality Video Autoregressively Without Vector Quantization
AI Frontier Lectures
AI Frontier Lectures
Mar 14, 2025 · Artificial Intelligence

Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p

Open‑Sora 2.0, an open‑source 11‑billion‑parameter video generation model, delivers 720p 24 fps videos with visual quality and text‑image alignment comparable to proprietary systems like HunyuanVideo and Step‑Video, while cutting training costs to $200 k using only 224 GPUs, and the release includes full code, weights, and a Gradio demo.

3D autoencoderAIMMDiT
0 likes · 7 min read
Open-Sora 2.0: How an 11B Open-Source Model Beats Closed-Source Video AI at 720p
NewBeeNLP
NewBeeNLP
Mar 14, 2025 · Artificial Intelligence

How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost

Open‑Sora 2.0 is an open‑source 11B‑parameter video generation model that matches commercial SOTA performance while being trained on 224 GPUs for just $200,000, thanks to a 3D auto‑encoder, MMDiT architecture, aggressive data filtering, low‑resolution pre‑training, and highly optimized parallel training techniques.

AI modelMMDiTOpen-Sora
0 likes · 9 min read
How Open‑Sora 2.0 Achieves SOTA Video Generation with Only $200K Training Cost
DataFunTalk
DataFunTalk
Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAhardware acceleration
0 likes · 11 min read
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025
DaTaobao Tech
DaTaobao Tech
Feb 24, 2025 · Artificial Intelligence

AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications

The article surveys recent AIGC video generation advances for Taobao e‑commerce, detailing lip‑sync models like Wav2Lip and MuseTalk, head‑driven systems such as Hallo and EchoMimic, body‑driven pipelines including AnimateAnyone and Tango, and a four‑stage production workflow that boosts click‑through rates and enables virtual try‑on.

AIGCDeep LearningMultimodal AI
0 likes · 21 min read
AIGC Video Generation Techniques for E‑commerce: Lip‑Sync, Head/Body Driving, and Business Applications
Infra Learning Club
Infra Learning Club
Feb 21, 2025 · Artificial Intelligence

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

This article introduces five open‑source AI tools—a PPT generator, an LLM app development platform, a cloud‑agnostic AI runner, a curated collection of LLM applications, and a one‑click HD video creator—detailing their key features, usage links, and sample configurations.

AIDifyLLM
0 likes · 8 min read
5 Must‑Try Open‑Source AI Projects You Can Start Using Today
AIWalker
AIWalker
Feb 13, 2025 · Artificial Intelligence

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

FlashVideo introduces a two‑stage framework that first generates low‑resolution videos with strong prompt fidelity and then uses flow‑matching ODE trajectories to upscale to 4K quality in just four function evaluations, achieving top VBench‑Long scores while cutting generation time by up to five‑fold.

AIEfficiencyFlashVideo
0 likes · 26 min read
How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute
AIWalker
AIWalker
Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Large‑Scale TrainingMultimodal AIflow-based models
0 likes · 29 min read
Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation
AIWalker
AIWalker
Feb 10, 2025 · Artificial Intelligence

FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation

FlashVideo introduces a two‑stage diffusion framework that first ensures prompt fidelity at low resolution with a 5‑billion‑parameter DiT, then efficiently adds fine details at high resolution using flow matching, achieving state‑of‑the‑art quality with dramatically lower compute cost.

AIDiffusion ModelsFlashVideo
0 likes · 21 min read
FlashVideo Sets New SOTA for Faster, High‑Fidelity High‑Resolution Video Generation