Tagged articles

text-to-video

28 articles · Page 1 of 1

Apr 8, 2026 · Artificial Intelligence

HappyHorse 1.0 Tops AI Video Ranking, Leaving Seedance 2.0 74 Points Behind

A mysterious model dubbed HappyHorse‑1.0 surged to the top of the Artificial Analysis video‑AI leaderboard with an Elo score of 1347, outpacing the previously dominant Seedance 2.0 by 74 points, sparking intense community debate over its origin and scoring methodology.

AI-leaderboardHappyHorseSeedance

0 likes · 7 min read

HappyHorse 1.0 Tops AI Video Ranking, Leaving Seedance 2.0 74 Points Behind

HyperAI Super Neural

Nov 25, 2025 · Artificial Intelligence

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

LongCat‑Video, an open‑source video generation model from Meituan, adopts a unified multi‑task architecture to handle text‑to‑video, image‑to‑video and video‑continuation, delivers minute‑long high‑quality clips with coarse‑to‑fine inference, achieves benchmark scores comparable to leading models like Wan2.2, and provides a one‑click deployment tutorial on HyperAI.

LongCat-VideoMeituanRLHF

0 likes · 6 min read

LongCat‑Video: Meituan’s Model for Text‑to‑Video, Image‑to‑Video & Continuation

Instant Consumer Technology Team

Jul 16, 2025 · Artificial Intelligence

How to Build a Text‑to‑Video Workflow in Dify Using LLMs

This guide walks you through creating a Dify workflow that turns user prompts into videos by chaining LLM‑generated descriptions with a Text‑to‑Video model, covering workflow types, system variables, model setup, node configuration, plugin installation, and final testing steps.

AIDifyLLM

0 likes · 14 min read

How to Build a Text‑to‑Video Workflow in Dify Using LLMs

Kuaishou Tech

May 26, 2025 · Artificial Intelligence

CineMaster: A 3D‑Aware and Controllable Framework for Cinematic Text‑to‑Video Generation

Researchers introduce CineMaster, a SIGGRAPH‑2025 paper presenting a 3D‑aware, controllable text‑to‑video generation framework that lets users define target objects and camera motions via an interactive workflow, enabling cinematic video creation with high‑quality, user‑directed results.

3D-awareAI videoCineMaster

0 likes · 6 min read

CineMaster: A 3D‑Aware and Controllable Framework for Cinematic Text‑to‑Video Generation

Alibaba Cloud Developer

Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelcomputer visiondeep learning

0 likes · 5 min read

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

AIWalker

Apr 6, 2025 · Artificial Intelligence

NOVA: Redefining Autoregressive Visual Modeling Without Vector Quantization

NOVA introduces a highly efficient autoregressive video generation framework that eliminates vector quantization, combines frame‑by‑frame causal prediction with set‑by‑set spatial attention, and achieves state‑of‑the‑art quality on VBench and GenEval while offering strong zero‑shot generalization across text‑to‑image and text‑to‑video tasks.

Novaautoregressive video generationbenchmark results

0 likes · 14 min read

NOVA: Redefining Autoregressive Visual Modeling Without Vector Quantization

Bilibili Tech

Mar 4, 2025 · Artificial Intelligence

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

The Bilibili TTV team optimized OpenSora and CogVideoX text‑to‑video models by redesigning data storage with Alluxio, parallelizing VAE encoding, applying dynamic sequence‑parallel and DeepSpeed‑Ulysses attention, adapting GPU code for NPU execution, leveraging profiling‑driven kernel fusion, FlashAttention, and expandable memory to dramatically increase training efficiency and frame throughput, while outlining future pipeline‑parallel and ZeRO‑3 scaling plans.

FlashAttentionNPUdata pipeline

0 likes · 26 min read

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

Alibaba Cloud Big Data AI Platform

Feb 18, 2025 · Artificial Intelligence

One-Click Deployment of Cutting-Edge Text-to-Video and Voice Interaction Models

This article introduces the state‑of‑the‑art Step‑Video‑T2V text‑to‑video model and the Step‑Audio‑Chat voice interaction model, outlines their technical specifications and benchmark results, and provides a detailed step‑by‑step guide for deploying both models with a single click using Alibaba Cloud's PAI Model Gallery.

AI model deploymentPAI Model Gallerystate-of-the-art

0 likes · 9 min read

One-Click Deployment of Cutting-Edge Text-to-Video and Voice Interaction Models

Baobao Algorithm Notes

Oct 17, 2024 · Artificial Intelligence

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

Meta’s newly released 92‑page Movie Gen paper introduces a multimodal LLM that unifies text‑to‑image, text‑to‑video, personalized video, precise video editing, and audio generation, detailing its dual‑model architecture, training pipeline, temporal auto‑encoder design, scaling strategies, evaluation benchmark, and ablation studies.

EvaluationModel Scalingdeep learning

0 likes · 34 min read

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

Refining Core Development Skills

Aug 8, 2024 · Artificial Intelligence

Getting Started with CodeVideoX API for Text‑to‑Video Generation Using Diffusion Transformers

This guide introduces CodeVideoX, a diffusion‑transformer based video generation model, explains its training and inference pipelines, and provides step‑by‑step instructions with API endpoints, required parameters, and example cURL commands for creating short AI‑generated videos.

AIGCAPICodeVideoX

0 likes · 8 min read

Getting Started with CodeVideoX API for Text‑to‑Video Generation Using Diffusion Transformers

DataFunTalk

May 3, 2024 · Artificial Intelligence

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

This article reviews the rapid progress of text‑to‑video generation, explains diffusion‑based video synthesis, outlines key technical challenges such as motion modeling, semantic alignment and quality, and presents Tencent’s solutions and real‑world applications, while also discussing future directions and the impact of OpenAI’s Sora model.

AIDiffusion ModelsSora

0 likes · 23 min read

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

21CTO

Apr 17, 2024 · Artificial Intelligence

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

This article breaks down OpenAI's Sora text‑to‑video model, exploring its overall structure, visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion, long‑time consistency strategies, training techniques, and the technical choices that enable variable resolution, aspect ratios, and up to 60‑second video generation.

AI video generationSoraTransformer

0 likes · 50 min read

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

Open Source Linux

Apr 16, 2024 · Artificial Intelligence

How Sora’s Text-to-Video Model Is Redefining AI‑Generated Video

Sora, a new text‑to‑video AI model, can create one‑minute videos from textual prompts or static images, delivering industry‑leading fidelity, resolution, and coherent motion by using spatial‑temporal patches inspired by ViViT, and shows emergent capabilities that hint at universal physical simulation.

Multimodal AISora modelViViT

0 likes · 4 min read

How Sora’s Text-to-Video Model Is Redefining AI‑Generated Video

Architects' Tech Alliance

Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelMultimodalSora

0 likes · 6 min read

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Alipay Experience Technology

Mar 28, 2024 · Artificial Intelligence

How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons

This article introduces OpenAI’s Sora video‑generation model, compares it with other leading solutions, explains its underlying diffusion‑based architecture, showcases sample outputs, outlines its diverse generation abilities, and discusses current limitations and future implications for AI‑driven video creation.

AI video generationOpenAISora

0 likes · 13 min read

How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons

DevOps

Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

AIOpenAISora

0 likes · 8 min read

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

DaTaobao Tech

Mar 25, 2024 · Artificial Intelligence

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

AIAIGCDiffusion Models

0 likes · 13 min read

Survey of AIGC Video Generation Algorithms

NewBeeNLP

Mar 22, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

This article provides a step‑by‑step technical analysis of OpenAI’s Sora model, examining its possible overall architecture, video encoder‑decoder design, Spacetime Latent Patch mechanism, transformer‑based diffusion process, training strategies, and long‑term consistency techniques, while grounding each speculation in publicly available reports and related research.

AI analysisSoraTransformer

0 likes · 50 min read

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

Baobao Algorithm Notes

Mar 22, 2024 · Artificial Intelligence

Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model

This article provides a detailed, step‑by‑step technical analysis of OpenAI's Sora text‑to‑video system, exploring its overall architecture, visual encoder‑decoder choices, Spacetime Latent Patch design, transformer‑based diffusion model, training strategies, and long‑time consistency mechanisms while referencing relevant research papers and open‑source techniques.

AISoradiffusion

0 likes · 50 min read

Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model

58UXD

Feb 27, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Redefining AI‑Generated Video Creation

OpenAI’s newly released Sora model, built on the DALL‑E 3 foundation, can generate up to 60‑second high‑quality videos from text prompts, offering features such as multi‑character scenes, seamless video synthesis, image‑to‑video animation, physical world simulation, and prompting guidance for designers, while raising ethical and creative challenges.

AI video generationOpenAIPrompt engineering

0 likes · 9 min read

How OpenAI’s Sora Is Redefining AI‑Generated Video Creation

Architect

Feb 22, 2024 · Artificial Intelligence

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

The article provides a comprehensive technical overview of OpenAI’s Sora text‑to‑video model, explaining its background, underlying diffusion‑Transformer architecture, key breakthroughs, potential industry impacts, success factors, limitations, and future prospects for AI‑generated video content.

AIDiffusion ModelsOpenAI

0 likes · 15 min read

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

High Availability Architecture

Feb 22, 2024 · Artificial Intelligence

Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model

OpenAI’s newly released Sora text‑to‑video model demonstrates unprecedented high‑resolution, long‑duration video generation by encoding videos into latent space, applying diffusion with a transformer conditioned on text, and decoding back to pixels, marking a major leap in AI video synthesis and its potential applications.

AI video generationSoradiffusion model

0 likes · 14 min read

Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model

Architects' Tech Alliance

Feb 22, 2024 · Artificial Intelligence

OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights

OpenAI’s Sora model demonstrates unprecedented text‑to‑video generation with up to 60‑second high‑fidelity clips, consistent multi‑character scenes, multi‑camera motion, and world‑simulation abilities, backed by a diffusion‑transformer trained on compressed latent video patches and detailed technical analysis from its accompanying research paper.

AI video generationOpenAISora

0 likes · 11 min read

OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights

Tencent Cloud Developer

Feb 21, 2024 · Artificial Intelligence

OpenAI Sora: Technical Principles and Industry Impact Analysis

OpenAI’s Sora, a text‑to‑video model released during Chinese New Year, combines a VAE encoder, latent diffusion with a DiT transformer, and a VAE decoder to generate videos from prompts, supporting flexible durations and resolutions, language understanding, and uses in creation, editing, and entertainment, though it struggles with physical consistency and long‑term coherence, and its debut is reshaping short‑form video, digital‑human, gaming, and graphics industries.

AI video generationOpenAISora

0 likes · 14 min read

OpenAI Sora: Technical Principles and Industry Impact Analysis

DevOps

Feb 18, 2024 · Artificial Intelligence

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

OpenAI's Sora, the first text‑to‑video model, demonstrates unprecedented video quality and length by leveraging massive high‑quality training data, novel video‑patch representations, diffusion‑based transformer architecture, and precise subtitle generation, reshaping both AI research and media production.

OpenAISoradiffusion model

0 likes · 9 min read

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

21CTO

Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos

OpenAI’s newly unveiled Sora system can generate 60‑second, high‑quality videos from plain text prompts, leveraging a data‑driven physical engine trained on synthetic data from Unreal Engine 5, with contributions from researchers like Tim Brooks and Bill Peebles, marking a major AI video‑generation breakthrough.

Generative AIOpenAIdeep learning

0 likes · 6 min read

How OpenAI’s Sora Turns Text into Realistic 60‑Second Videos

Architect

Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI safetyDiffusion ModelsGenerative AI

0 likes · 12 min read

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

Tencent Cloud Developer

Nov 1, 2022 · Artificial Intelligence

The Rise of AI-Generated Content: Technologies, Applications, and Risks

The article surveys the evolution of AI‑generated content from early art programs to modern diffusion‑based text‑to‑image and text‑to‑video models, outlines key milestones such as Stable Diffusion and DALL‑E 2, explores gaming applications, and highlights limitations, ethical concerns, and copyright risks of open‑source generative AI.

AI generationcreative AItext-to-image

0 likes · 22 min read

The Rise of AI-Generated Content: Technologies, Applications, and Risks