Tagged articles
68 articles
Page 1 of 1
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

ICLR 2026: How LiveMoments Restores Live Photo Cover Frames Without Blur

The paper "LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference‑guided Diffusion" introduces a new task and a diffusion‑based method that uses the original high‑resolution cover frame to dramatically improve the visual quality of reselection cover frames in Live Photos, outperforming existing reference‑super‑resolution and single‑frame approaches.

ICLR 2026Image RestorationLive Photo
0 likes · 8 min read
ICLR 2026: How LiveMoments Restores Live Photo Cover Frames Without Blur
Meituan Technology Team
Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchaudio generationdiffusion model
0 likes · 12 min read
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 14, 2026 · Artificial Intelligence

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

The DFlash approach replaces speculative decoding’s autoregressive drafter with a block diffusion model and injects target‑model hidden features into every KV‑cache layer, achieving up to 5× speed‑up for Qwen3.5‑27B on single‑GPU and 1.5–1.9× on high‑concurrency workloads while preserving output quality.

DFlashInference AccelerationSGLang
0 likes · 12 min read
Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss
AI Explorer
AI Explorer
Apr 11, 2026 · Artificial Intelligence

VoxCPM2: Tokenizer‑Free Multilingual TTS that Creates New Voices from Text

VoxCPM2, an open‑source 2‑billion‑parameter TTS model from OpenBMB, eliminates tokenizers and uses a diffusion‑autoregressive architecture to generate high‑fidelity, controllable speech in 30 languages, supporting voice design from natural‑language prompts and high‑quality voice cloning with just a short reference clip.

AudioVAETTSVoxCPM2
0 likes · 8 min read
VoxCPM2: Tokenizer‑Free Multilingual TTS that Creates New Voices from Text
Machine Heart
Machine Heart
Apr 3, 2026 · Artificial Intelligence

Capture Character Animation from Any Object Using Just a Phone – CHI 2026 Best Paper Nominee

DancingBox demonstrates that a single RGB camera, a flat calibration board, and any handheld object can be used to capture realistic character animation by first estimating coarse 3D bounding‑box motion with visual foundation models and then refining it with a diffusion‑based motion generation model, validated by a user study.

AIDancingBoxHuman-Computer Interaction
0 likes · 9 min read
Capture Character Animation from Any Object Using Just a Phone – CHI 2026 Best Paper Nominee
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Apr 2, 2026 · Artificial Intelligence

Diffolio: A Diffusion‑Model Framework for Risk‑Aware Portfolio Optimization

Diffolio introduces a diffusion‑model‑based approach that directly learns a pseudo‑optimal portfolio distribution conditioned on user risk preferences, generating diverse high‑quality portfolios and outperforming classic and recent baselines on six real‑world market datasets, with annualized returns improving up to 12.1 percentage points.

Financial AIGenerative ModelingQuantitative Finance
0 likes · 22 min read
Diffolio: A Diffusion‑Model Framework for Risk‑Aware Portfolio Optimization
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Mar 14, 2026 · Artificial Intelligence

Quantitative Finance Paper Digest: AI‑Driven Market Prediction Studies (Mar 7‑13 2026)

This digest summarizes four recent research papers that apply advanced AI techniques—node‑transformer graphs with BERT sentiment analysis, a quantum‑classical LSTM‑Born machine hybrid, large‑language‑model benchmarking for portfolio optimization, and a conditional diffusion model—to improve stock market prediction, volatility forecasting, and investment decision making, providing detailed experimental results and statistical validation.

BERTQuantum ComputingTransformer
0 likes · 10 min read
Quantitative Finance Paper Digest: AI‑Driven Market Prediction Studies (Mar 7‑13 2026)
Amap Tech
Amap Tech
Feb 11, 2026 · Artificial Intelligence

Can Diffusion Models Turn Noisy GPS into Sub‑Meter Visual Localization?

The DiffVL framework redefines visual localization as a diffusion‑based GPS denoising task, using BEV‑conditioned visual cues and standard SD maps to achieve sub‑meter accuracy without high‑definition maps, and demonstrates its superiority through extensive autonomous‑driving experiments.

BEVGPS denoisingSD map
0 likes · 11 min read
Can Diffusion Models Turn Noisy GPS into Sub‑Meter Visual Localization?
HyperAI Super Neural
HyperAI Super Neural
Feb 9, 2026 · Artificial Intelligence

MIT and Partners Use 23k+ Recipes and Diffusion Models to Create Zeolites with Si/Al = 19

The study introduces DiffSyn, a generative diffusion model trained on 23,961 zeolite synthesis recipes spanning over 50 years, which outperforms regression and other generative baselines, accurately predicts synthesis routes, and experimentally validates a novel UFI zeolite with a record Si/Al ratio of 19.

Chemical GuidanceMaterials SynthesisZeolites
0 likes · 17 min read
MIT and Partners Use 23k+ Recipes and Diffusion Models to Create Zeolites with Si/Al = 19
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 17, 2026 · Artificial Intelligence

DiffNBR: A Spatiotemporal Diffusion and Information‑Bottleneck Approach for Next‑Basket Recommendation

DiffNBR introduces a dual‑path diffusion framework combined with an information‑bottleneck mechanism to jointly model spatial co‑occurrence and temporal evolution in next‑basket recommendation, achieving state‑of‑the‑art performance and effectively disentangling repetitive and exploratory purchase patterns.

DiffNBRdiffusion modelinformation bottleneck
0 likes · 8 min read
DiffNBR: A Spatiotemporal Diffusion and Information‑Bottleneck Approach for Next‑Basket Recommendation
HyperAI Super Neural
HyperAI Super Neural
Nov 10, 2025 · Artificial Intelligence

Columbia & Stanford Launch Squidiff: Diffusion Model for Transcriptome Simulation

Squidiff, a conditional diffusion framework co‑developed by Columbia and Stanford, predicts transcriptional responses across cell differentiation, gene and drug perturbations, and radiation exposure, outperforming prior models and enabling more precise and spatially aware biomedical research.

AI for BiologyPerturbation PredictionSingle-Cell Transcriptomics
0 likes · 16 min read
Columbia & Stanford Launch Squidiff: Diffusion Model for Transcriptome Simulation
HyperAI Super Neural
HyperAI Super Neural
Oct 27, 2025 · Artificial Intelligence

MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

BoltzGen, an all‑atom generative model released by MIT and collaborators, unifies protein folding and binder design with a geometric continuous representation and a flexible design language, training on multimodal datasets and demonstrating nM‑level affinity for 66% of 26 diverse targets including proteins, nanobodies, peptides and small molecules.

BoltzGenMultimodal Trainingdiffusion model
0 likes · 12 min read
MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types
Data Party THU
Data Party THU
Oct 24, 2025 · Artificial Intelligence

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

The paper introduces BREEZE, a behavior‑regularized zero‑shot RL framework that improves stability, policy extraction, and representation quality by combining in‑sample learning, task‑conditioned diffusion models, and expressive attention‑based architectures, achieving near‑state‑of‑the‑art performance on benchmarks like ExORL and D4RL Kitchen.

behavioral regularizationdiffusion modeloffline RL
0 likes · 3 min read
BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization
Kuaishou Tech
Kuaishou Tech
Sep 23, 2025 · Artificial Intelligence

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

This article explains the core challenges of real‑time bidding, traces the evolution from PID to MPC to reinforcement learning, and details how generative reinforcement‑learning techniques such as GAVE and CBD combine diffusion models, value‑guided exploration, and score‑based return‑to‑go to dramatically improve ad‑bid efficiency and revenue.

CBDGAVEadvertising algorithms
0 likes · 15 min read
How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding
Bilibili Tech
Bilibili Tech
Sep 19, 2025 · Artificial Intelligence

How TextFlux Enables OCR‑Free Multi‑Language Scene Text Editing with Diffusion Models

TextFlux introduces an OCR‑free diffusion‑based framework that seamlessly inserts multilingual text into real‑world images using only glyph images and minimal training data, offering high visual fidelity, zero‑shot character rendering, and efficient multi‑line and single‑line generation on consumer GPUs.

OCR-freeText Editingdiffusion model
0 likes · 8 min read
How TextFlux Enables OCR‑Free Multi‑Language Scene Text Editing with Diffusion Models
AI Frontier Lectures
AI Frontier Lectures
Sep 8, 2025 · Artificial Intelligence

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

DynamicFace introduces a novel face‑swapping framework that combines diffusion models with composable 3D facial priors, explicitly decoupling identity, pose, expression, lighting and background, achieving superior identity preservation and motion consistency across images and videos, as demonstrated by extensive qualitative and quantitative comparisons with SOTA methods.

3D facial priorsdiffusion modelface swapping
0 likes · 10 min read
How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video
Architects Research Society
Architects Research Society
Sep 4, 2025 · Artificial Intelligence

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

This article outlines the four dominant generative AI architectures—Transformers, diffusion models, GANs, and RNNs—explaining their core mechanisms, key capabilities, and typical application domains such as chatbots, image creation, deep‑fake media, and time‑series analysis, helping readers choose the right model for their needs.

AI applicationsGANRNN
0 likes · 3 min read
Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained
JD Cloud Developers
JD Cloud Developers
Aug 27, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

This article details how JD.com’s AI‑driven virtual fitting solution, integrated with an A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, cutting production costs to zero, accelerating design cycles, and increasing conversion rates by over 80% during major sales events.

A/B testingAIFashion E‑commerce
0 likes · 14 min read
How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 18, 2025 · Artificial Intelligence

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

DynamicFace introduces a controllable face‑swapping framework that leverages composable 3D facial priors, dual‑stream identity injection, and a FusionTVO module to achieve superior image and video quality, identity preservation, and temporal consistency, outperforming existing state‑of‑the‑art methods on benchmark datasets.

3D facial priorsAIControllable Generation
0 likes · 13 min read
DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps
Data Party THU
Data Party THU
Aug 17, 2025 · Artificial Intelligence

How BioEmu Generates Protein Conformational Ensembles Faster Than MD

Microsoft Research’s AI for Science team released the open‑source BioEmu model, a generative diffusion architecture that leverages AlphaFold’s Evoformer and extensive MD and stability data to efficiently sample protein conformational ensembles, achieving near‑MD accuracy in free‑energy and mutation stability predictions while dramatically reducing computational cost.

AlphaFoldbioinformaticsdiffusion model
0 likes · 6 min read
How BioEmu Generates Protein Conformational Ensembles Faster Than MD
AIWalker
AIWalker
Aug 13, 2025 · Artificial Intelligence

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

The paper presents OMFA, a diffusion‑based unified framework for virtual try‑on and try‑off that removes the need for garment templates, segmentation masks, and fixed poses by leveraging a novel partial‑diffusion mechanism and SMPL‑X pose conditioning, achieving state‑of‑the‑art results on VITON‑HD and DeepFashion‑MultiModal datasets.

AI try-onComputer VisionSMPL-X
0 likes · 15 min read
One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks
Data Party THU
Data Party THU
Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEComputer VisionVision Transformer
0 likes · 11 min read
How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer
Kuaishou Tech
Kuaishou Tech
Jul 9, 2025 · Artificial Intelligence

How ResULIC Achieves Ultra‑Low‑Rate Image Compression with Semantic Residual Coding and Diffusion

The paper introduces ResULIC, a residual‑guided ultra‑low‑bitrate image compression framework that combines semantic residual coding, a compression‑aware diffusion model, and perceptual fidelity optimization to dramatically improve visual quality and outperform prior diffusion‑based methods on standard benchmarks.

ResULICdiffusion modelimage compression
0 likes · 12 min read
How ResULIC Achieves Ultra‑Low‑Rate Image Compression with Semantic Residual Coding and Diffusion
JD Cloud Developers
JD Cloud Developers
Jul 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try-On Boosted Fashion Sales by 80%+

This article details JD Retail's AI virtual try‑on system, its technical challenges, innovations such as a large‑scale diffusion model and adaptive masking, and how the solution dramatically cut costs, accelerated image production, and increased conversion rates for fashion e‑commerce during a major promotion.

AIFashion E‑commercediffusion model
0 likes · 14 min read
How AI-Powered Virtual Try-On Boosted Fashion Sales by 80%+
Huolala Tech
Huolala Tech
Jul 2, 2025 · Artificial Intelligence

Can Diffusion Models Revolutionize Salient Object Detection?

This article introduces a diffusion‑based framework for salient object detection, discusses its background, challenges, and motivations, details the model architecture and training, presents extensive experiments and ablation studies, and outlines limitations and future research directions.

Computer VisionDeep Learningdiffusion model
0 likes · 11 min read
Can Diffusion Models Revolutionize Salient Object Detection?
AI Frontier Lectures
AI Frontier Lectures
Mar 25, 2025 · Artificial Intelligence

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that jointly enhance geometric control and realism in 3D indoor scene synthesis, outperforming state‑of‑the‑art methods across multiple quantitative and qualitative benchmarks.

3D scene generationAIComputer Vision
0 likes · 12 min read
Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?
AIWalker
AIWalker
Feb 23, 2025 · Artificial Intelligence

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

U‑ViT replaces the convolutional U‑Net backbone of diffusion models with a Vision Transformer, treats time, condition and noisy patches as tokens, adds long skip connections and a lightweight 3×3 convolution, and through extensive ablations and scaling studies achieves state‑of‑the‑art FID scores on unconditional, class‑conditional and text‑to‑image generation tasks.

AdaLNFIDLong Skip Connections
0 likes · 16 min read
U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation
Kuaishou Tech
Kuaishou Tech
Feb 20, 2025 · Artificial Intelligence

Second Short-Form Video Quality Assessment and Enhancement Challenge (CVPR NTIRE 2025)

The second short-form video quality assessment and enhancement challenge, co‑organized by Kuaishou's audio‑video team and the Intelligent Media Computing Lab, invites global researchers to develop efficient quality assessment models and diffusion‑based super‑resolution methods using the new KwaiSR dataset, with prize money and potential CVPR workshop paper invitations.

AI competitionCVPR NTIREDataset
0 likes · 9 min read
Second Short-Form Video Quality Assessment and Enhancement Challenge (CVPR NTIRE 2025)
AntTech
AntTech
Dec 19, 2024 · Artificial Intelligence

Framer: Interactive Video Frame Interpolation Using Diffusion Models

Framer is an interactive video frame interpolation method that leverages large‑pretrained video diffusion models, allowing users to define custom motion trajectories or use an automatic mode, and demonstrates strong performance in image deformation, video generation, and cartoon‑to‑video applications.

AIFramerdiffusion model
0 likes · 4 min read
Framer: Interactive Video Frame Interpolation Using Diffusion Models
DaTaobao Tech
DaTaobao Tech
Nov 27, 2024 · Artificial Intelligence

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

FuseAnyPart is a diffusion‑model‑based facial part swapping technique that fuses features from multiple reference images via mask‑based fusion and additive injection modules, delivering high‑fidelity, consistent face edits with lower computational cost, outperforming prior methods on CelebA‑HQ and FaceForensics++ and already boosting commercial AIGC applications.

Computer Visiondiffusion modelfacial part swapping
0 likes · 9 min read
FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images
360 Tech Engineering
360 Tech Engineering
Oct 31, 2024 · Artificial Intelligence

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation

The paper introduces HiCo, a hierarchical controllable diffusion model that enables precise layout‑to‑image generation by decoupling object and background features through weight‑shared branches and a fusion module, achieving high‑quality results and efficient inference as demonstrated on the HiCo‑7K benchmark.

AI paintingHiCoNeurIPS2024
0 likes · 9 min read
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation
Alimama Tech
Alimama Tech
Oct 17, 2024 · Artificial Intelligence

FLUX ControlNet Inpainting and 8-Step Turbo Acceleration Models

Alibaba’s Mama Intelligent Creation team has open‑sourced a FLUX‑based ControlNet inpainting model that leverages a DiT‑backed Interleave design for superior repair quality, and an 8‑step LoRA‑Turbo model that cuts inference time three‑fold while preserving near‑original image fidelity, both now available on Hugging Face and ModelScope.

AIControlNetFlux
0 likes · 9 min read
FLUX ControlNet Inpainting and 8-Step Turbo Acceleration Models
Kuaishou Tech
Kuaishou Tech
Sep 27, 2024 · Artificial Intelligence

XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution

The paper introduces XPSR, a diffusion‑based image super‑resolution method that incorporates cross‑modal semantic priors from a large multimodal language model, achieving state‑of‑the‑art performance on both reference and no‑reference quality metrics across synthetic and real‑world video restoration tasks.

AI researchECCV2024cross‑modal priors
0 likes · 8 min read
XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution
Ops Development & AI Practice
Ops Development & AI Practice
Jul 9, 2024 · Artificial Intelligence

How AnimateDiff-Lightning Elevates Open-Source AI Animation

AnimateDiff-Lightning, an open‑source diffusion model released by ByteDance on Hugging Face, delivers high‑resolution image and video generation with versatile integration, showcasing how community‑driven AI tools can accelerate creative and commercial applications.

AIByteDanceHugging Face
0 likes · 5 min read
How AnimateDiff-Lightning Elevates Open-Source AI Animation
Baidu Tech Salon
Baidu Tech Salon
May 24, 2024 · Artificial Intelligence

HelixDock: A Large-Scale Pretrained Full-Atom Diffusion Model for Protein–Small Molecule Docking

HelixDock, a full‑atom diffusion model pretrained on a billion‑scale simulated docking dataset covering ~200,000 protein targets, delivers state‑of‑the‑art docking accuracy—85.6% success on PoseBusters and strong generalization on cross‑docking benchmarks—showing that massive data and model scaling dramatically improve AI‑driven drug discovery, and its code and data are fully open‑source.

AI for drug discoveryDeep LearningHelixDock
0 likes · 6 min read
HelixDock: A Large-Scale Pretrained Full-Atom Diffusion Model for Protein–Small Molecule Docking
21CTO
21CTO
Apr 17, 2024 · Artificial Intelligence

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

This article breaks down OpenAI's Sora text‑to‑video model, exploring its overall structure, visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion, long‑time consistency strategies, training techniques, and the technical choices that enable variable resolution, aspect ratios, and up to 60‑second video generation.

AI video generationLatent DiffusionSora
0 likes · 50 min read
How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture
360 Tech Engineering
360 Tech Engineering
Apr 17, 2024 · Artificial Intelligence

HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation

The 360 AI Research Institute introduces HiCo, a hierarchical controllable diffusion model that enables fine‑grained layout control across up to eight image regions, integrates seamlessly with existing Stable Diffusion ecosystems, and demonstrates superior performance on the GRIT‑VAL benchmark for layout‑aware image synthesis.

AI drawingControllable GenerationHiCo
0 likes · 8 min read
HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation
Architect
Architect
Apr 16, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

This article dissects the possible architecture of OpenAI's Sora video model, tracing its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion backbone, long‑time consistency strategies, and training pipeline, while comparing alternatives such as MAGVIT‑v2, TECO, NaViT, and FDM to reveal why each design choice may have been made.

AI ArchitectureLatent DiffusionSora
0 likes · 51 min read
Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator
JD Retail Technology
JD Retail Technology
Apr 10, 2024 · Artificial Intelligence

AI-Generated E-commerce Advertising Images: Relationship-Aware Diffusion Models for Layout, Background, and Poster Generation

This article analyzes the challenges of manual e‑commerce ad image creation and presents JD's innovative AI solutions—including a relationship‑aware diffusion model for poster layout, a category‑common and personalized background generator, and an end‑to‑end planning‑and‑rendering framework—that achieve high‑quality automatic ad creative generation and boost advertising revenue.

AIdiffusion modelimage generation
0 likes · 21 min read
AI-Generated E-commerce Advertising Images: Relationship-Aware Diffusion Models for Layout, Background, and Poster Generation
Architect
Architect
Mar 28, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

This article explains OpenAI's Sora video generation model, detailing its latent diffusion foundation, video compression network, spacetime patch representation, Diffusion Transformer processing, and decoding pipeline, while also reviewing related Stable Diffusion and Transformer concepts that enable high‑quality text‑to‑video synthesis.

AIDeep LearningLatent Diffusion
0 likes · 17 min read
Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies
Alipay Experience Technology
Alipay Experience Technology
Mar 28, 2024 · Artificial Intelligence

How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons

This article introduces OpenAI’s Sora video‑generation model, compares it with other leading solutions, explains its underlying diffusion‑based architecture, showcases sample outputs, outlines its diverse generation abilities, and discusses current limitations and future implications for AI‑driven video creation.

AI video generationOpenAISora
0 likes · 13 min read
How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons
DaTaobao Tech
DaTaobao Tech
Mar 27, 2024 · Artificial Intelligence

Building a Simple Diffusion Model with Python

This tutorial walks through implementing a basic Denoising Diffusion Probabilistic Model in Python, explaining the forward noise schedule, reverse denoising training, and providing complete code for noise schedules, diffusion functions, residual and attention blocks, a UNet architecture, loss computation, and a training loop.

DDPMPythonU-Net
0 likes · 26 min read
Building a Simple Diffusion Model with Python
DevOps
DevOps
Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

AIOpenAISora
0 likes · 8 min read
OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model
NewBeeNLP
NewBeeNLP
Mar 22, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

This article provides a step‑by‑step technical analysis of OpenAI’s Sora model, examining its possible overall architecture, video encoder‑decoder design, Spacetime Latent Patch mechanism, transformer‑based diffusion process, training strategies, and long‑term consistency techniques, while grounding each speculation in publicly available reports and related research.

AI analysisSoraTransformer
0 likes · 50 min read
Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine
DataFunTalk
DataFunTalk
Mar 21, 2024 · Artificial Intelligence

A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation

This article provides a comprehensive, easy‑to‑understand breakdown of Sora’s possible architecture—including its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion model, long‑time consistency strategies, training techniques, and how it supports variable resolution and duration video generation.

AI ArchitectureSoraSpacetime Patch
0 likes · 49 min read
A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation
DataFunTalk
DataFunTalk
Mar 18, 2024 · Artificial Intelligence

High-Fidelity Image-to-Video Generation for E‑commerce Product Motion with AtomoVideo and Noise Rectification

This article presents Alibaba's research on using diffusion‑based AIGC techniques, including a training‑free Noise Rectification module and the AtomoVideo model, to automatically convert static product images into high‑quality, detail‑preserving video motions for e‑commerce advertising.

AIGCAtomoVideoNoise Rectification
0 likes · 15 min read
High-Fidelity Image-to-Video Generation for E‑commerce Product Motion with AtomoVideo and Noise Rectification
Alimama Tech
Alimama Tech
Mar 14, 2024 · Artificial Intelligence

High-Fidelity Image-to-Video Generation for E-commerce with AtomoVideo and Noise Rectification

Alibaba’s AI team introduced AtomoVideo, a diffusion‑based image‑to‑video generator enhanced by a training‑free Noise Rectification module that adds and corrects controlled noise to eliminate first‑frame errors, enabling merchants to automatically create high‑fidelity 4‑second 720p product videos with strong temporal consistency for e‑commerce advertising.

AIAIGCVideo Generation
0 likes · 10 min read
High-Fidelity Image-to-Video Generation for E-commerce with AtomoVideo and Noise Rectification
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 27, 2024 · Artificial Intelligence

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantID, an open‑source tool released by Xiaohongshu in early 2024, generates multiple stylized portraits that preserve a person’s facial identity from a single reference photo in seconds, eliminating fine‑tuning, large storage needs, and multi‑image requirements while seamlessly working with popular diffusion models like Stable Diffusion 1.5 and SDXL.

AIInstantIDdiffusion model
0 likes · 6 min read
InstantID: Zero-shot Identity-Preserving Generation in Seconds
High Availability Architecture
High Availability Architecture
Feb 22, 2024 · Artificial Intelligence

Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model

OpenAI’s newly released Sora text‑to‑video model demonstrates unprecedented high‑resolution, long‑duration video generation by encoding videos into latent space, applying diffusion with a transformer conditioned on text, and decoding back to pixels, marking a major leap in AI video synthesis and its potential applications.

AI video generationLatent DiffusionSora
0 likes · 14 min read
Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model
Architects' Tech Alliance
Architects' Tech Alliance
Feb 22, 2024 · Artificial Intelligence

OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights

OpenAI’s Sora model demonstrates unprecedented text‑to‑video generation with up to 60‑second high‑fidelity clips, consistent multi‑character scenes, multi‑camera motion, and world‑simulation abilities, backed by a diffusion‑transformer trained on compressed latent video patches and detailed technical analysis from its accompanying research paper.

AI video generationOpenAISora
0 likes · 11 min read
OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights
CSS Magic
CSS Magic
Feb 20, 2024 · Artificial Intelligence

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

The article walks through OpenAI’s own admission of Sora’s shortcomings—such as unrealistic physics, misplaced spatial details, and erratic object behavior—by showcasing concrete demo failures, additional observations, and technical notes about its diffusion‑based, transformer architecture and metadata embedding.

AI limitationsOpenAISora
0 likes · 7 min read
OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges
DevOps
DevOps
Feb 18, 2024 · Artificial Intelligence

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

OpenAI's Sora, the first text‑to‑video model, demonstrates unprecedented video quality and length by leveraging massive high‑quality training data, novel video‑patch representations, diffusion‑based transformer architecture, and precise subtitle generation, reshaping both AI research and media production.

OpenAISoradiffusion model
0 likes · 9 min read
OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

OpenAI’s Sora model introduces a diffusion‑transformer approach that generates high‑fidelity, 60‑second videos with consistent 3‑D camera motion, long‑term object persistence, and the ability to simulate interactive digital worlds, backed by a detailed technical report and research paper.

Computer VisionOpenAISora
0 likes · 9 min read
How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation
Ximalaya Technology Team
Ximalaya Technology Team
Feb 1, 2024 · Artificial Intelligence

Understanding AI Image Generation: Diffusion Models, CLIP, and Control Techniques

This guide explains how AI image generators such as Stable Diffusion and DALL·E 3 turn text prompts into pictures by using diffusion models, CLIP‑aligned embeddings, and optional controls like negative prompts, fine‑tuned LoRA checkpoints and ControlNet conditioning, highlighting their differences, workflow, and practical customization.

AI image generationCLIPControlNet
0 likes · 18 min read
Understanding AI Image Generation: Diffusion Models, CLIP, and Control Techniques
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 23, 2024 · Artificial Intelligence

Controllable Mind Visual Diffusion Model (CMVDM) for Reconstructing Visual Stimuli from fMRI Signals

The Controllable Mind Visual Diffusion Model (CMVDM) decodes fMRI signals into semantic vectors and silhouette maps, feeds them into a latent diffusion framework with a ControlNet‑style encoder, and reconstructs high‑fidelity images that surpass existing baselines in both structural similarity and semantic accuracy across multiple brain‑imaging datasets.

AIbrain decodingdiffusion model
0 likes · 13 min read
Controllable Mind Visual Diffusion Model (CMVDM) for Reconstructing Visual Stimuli from fMRI Signals
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPKnowledge Enhancementdiffusion model
0 likes · 12 min read
Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese
Tencent Cloud Developer
Tencent Cloud Developer
May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration
0 likes · 12 min read
QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting
Nightwalker Tech
Nightwalker Tech
Apr 13, 2023 · Artificial Intelligence

Fundamentals of AI‑Generated Image Creation: Diffusion Models and Stable Diffusion

This article provides a comprehensive overview of AI‑generated content (AIGC) for image creation, explaining the role of diffusion models, the architecture of Stable Diffusion—including CLIP, UNet, and VAE—and the underlying mathematical concepts such as Markov chains, Langevin dynamics, and Gaussian distributions.

AIAIGCStable Diffusion
0 likes · 31 min read
Fundamentals of AI‑Generated Image Creation: Diffusion Models and Stable Diffusion
58UXD
58UXD
Mar 7, 2023 · Artificial Intelligence

How Diffusion Models Power AI Image Generation: From Prompts to Pictures

This article explains how modern AI image generators like Midjourney and Stable Diffusion use diffusion models, large training datasets, deep learning, latent spaces, and CLIP to transform textual prompts into high‑quality images, while also discussing the impact on designers and future collaboration opportunities.

CLIPLatent SpaceMidjourney
0 likes · 7 min read
How Diffusion Models Power AI Image Generation: From Prompts to Pictures