Tagged articles

Diffusion Models

171 articles · Page 1 of 2

Jul 2, 2026 · Artificial Intelligence

EMCES: How Episodic Memory Guides Controllable Sample Synthesis to Boost Reinforcement Learning

The paper introduces EMCES, a method that injects episodic memory into controllable diffusion models and uses a hash‑based state representation to generate high‑value synthetic samples, dramatically improving sample efficiency and downstream reinforcement‑learning performance while cutting storage and time costs.

Diffusion ModelsEpisodic MemoryHashing

0 likes · 14 min read

EMCES: How Episodic Memory Guides Controllable Sample Synthesis to Boost Reinforcement Learning

Machine Heart

Jun 29, 2026 · Artificial Intelligence

Control Humanoid Robot Motion with a Sentence or Music via OMG Framework

OMG introduces a hierarchical “generation brain + tracking cerebellum” framework that leverages a large multimodal dataset and diffusion‑based OMG‑DiT network to let humanoid robots synthesize full‑body motions from a single sentence, music clip, or pose, achieving state‑of‑the‑art performance across text, audio, and motion benchmarks.

AI generationDiffusion ModelsOMG framework

0 likes · 11 min read

Control Humanoid Robot Motion with a Sentence or Music via OMG Framework

Data Party THU

Jun 23, 2026 · Artificial Intelligence

How Diffusion Models Achieve Generalization: Insights from a CVPR 2026 Tutorial

Diffusion models have set the state‑of‑the‑art in image, video, and audio generation, yet their training objective admits a unique closed‑form solution that merely memorizes training data; this tutorial examines why they still generalize by exploring score smoothing, architectural inductive bias, training dynamics, and data geometry, all illustrated with hands‑on Jupyter notebooks.

CVPR 2026Diffusion Modelsdata geometry

0 likes · 2 min read

How Diffusion Models Achieve Generalization: Insights from a CVPR 2026 Tutorial

AI Architecture Hub

Jun 23, 2026 · Artificial Intelligence

Top AI Papers This Week (June 14‑21): SpatialClaw, SkillWeaver, PreAct, and More

This article reviews seven recent AI research papers, detailing how SpatialClaw enables code‑based spatial reasoning for vision‑language models, SkillWeaver introduces compositional skill routing, PreAct compiles agent actions into reusable state‑machines, and other works advance world‑model inference, self‑designing RL environments, collective skill‑tree search, and process‑aligned reinforcement learning for diffusion LLMs.

Diffusion Modelsagent reasoninglarge language models

0 likes · 15 min read

Top AI Papers This Week (June 14‑21): SpatialClaw, SkillWeaver, PreAct, and More

Machine Learning Algorithms & Natural Language Processing

Jun 18, 2026 · Artificial Intelligence

UniRL: Tencent Hunyuan’s Open‑Source Framework Unifying Multimodal RL Training

UniRL is an open‑source, distributed reinforcement‑learning post‑training framework that consolidates fragmented pipelines for image, video, and language‑vision models, offering a unified rollout‑reward‑advantage‑train‑sync contract, extensive model support, built‑in algorithms, and multi‑modal reward components to lower engineering barriers in AIGC research.

Diffusion ModelsLLMMultimodal RL

0 likes · 10 min read

UniRL: Tencent Hunyuan’s Open‑Source Framework Unifying Multimodal RL Training

Machine Heart

Jun 13, 2026 · Artificial Intelligence

World Labs Unveils Three 3D Generation Papers While Co‑Founder Announces Departure

World Labs released three technically detailed papers—World Tracing, Modality Forcing, and Flex4DHuman—each extending 2D diffusion models to 3D generation, while co‑founder Christoph Lassner announced his departure due to injury, marking a notable milestone for the spatial‑AI startup.

3D generationDiffusion ModelsWorld Labs

0 likes · 14 min read

World Labs Unveils Three 3D Generation Papers While Co‑Founder Announces Departure

Machine Heart

Jun 10, 2026 · Artificial Intelligence

DRDD: Turning Diffusion Noise into a Domain Harmonizer for Image Translation

The paper introduces Decoupled Residual Denoising Diffusion (DRDD), which reinterprets Gaussian noise as a domain harmonizer and separates residual removal from denoising, enabling more data‑efficient, multi‑task image‑to‑image translation and achieving state‑of‑the‑art results on benchmarks such as All‑in‑One‑5 with limited paired data.

DRDDData EfficiencyDiffusion Models

0 likes · 14 min read

DRDD: Turning Diffusion Noise into a Domain Harmonizer for Image Translation

Machine Heart

May 29, 2026 · Artificial Intelligence

WaDi: One‑Step Image Generation with LoRA Meets RoPE

This work analyzes weight‑direction changes in diffusion‑model distillation, proposes a low‑rank rotation adapter (LoRaD) to model those changes, and integrates it into Variational Score Distillation as WaDi, achieving state‑of‑the‑art FID on COCO with only ~10% trainable parameters while generalizing to multiple downstream tasks.

Diffusion ModelsLoRARoPE

0 likes · 20 min read

WaDi: One‑Step Image Generation with LoRA Meets RoPE

Machine Heart

May 29, 2026 · Artificial Intelligence

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

DiffusionOPD introduces a unified on‑policy distillation framework for diffusion models that decouples single‑task online policy exploration from multi‑task capability integration, training expert teachers per task and distilling their skills into a single student model, achieving faster convergence and higher performance across composition, OCR, and aesthetic tasks.

Diffusion ModelsKL divergenceMulti-Task Learning

0 likes · 8 min read

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

Machine Heart

May 25, 2026 · Artificial Intelligence

VeRL-Omni: Universal RL Post‑Training for Diffusion and Multimodal Models

VeRL-Omni is an open‑source RL post‑training framework built on verl and vLLM‑Omni that enables efficient, high‑throughput rollout and flexible reward computation for diffusion, AR‑DiT, and unified multimodal generation models, supporting diverse hardware, modular trainers, and demonstrating up to 14% latency reduction and high training throughput in benchmark experiments.

Diffusion ModelsFlowGRPOMultimodal Generation

0 likes · 9 min read

VeRL-Omni: Universal RL Post‑Training for Diffusion and Multimodal Models

Machine Heart

May 25, 2026 · Artificial Intelligence

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.

Diffusion ModelsFlow-OPDManifold Anchor Regularization

0 likes · 9 min read

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Machine Heart

May 24, 2026 · Artificial Intelligence

How Hallo‑Live Achieves Real‑Time Streaming Text‑Driven Audio‑Video Avatar Generation

Hallo‑Live introduces an asynchronous dual‑stream diffusion framework combined with human‑centric preference‑guided distillation, enabling text‑driven audio‑video avatars to run at 20.38 FPS with 0.94 s latency—over 16× faster and 99.3 % lower latency than the teacher Ovi model while preserving visual quality and lip‑sync.

Diffusion ModelsHallo-LiveNVIDIA H200

0 likes · 9 min read

Machine Heart

May 21, 2026 · Artificial Intelligence

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

The RAEv2 framework replaces traditional VAEs by summing multiple layers of pretrained vision encoders, combines RAE with REPA for complementary semantic and spatial gains, and leverages free guidance, achieving up to ten‑fold faster convergence, higher image quality, and lower compute on ImageNet‑256 diffusion training.

Diffusion ModelsRAEv2Representation Autoencoder

0 likes · 11 min read

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

Machine Heart

May 15, 2026 · Artificial Intelligence

D-OPSD: On‑Policy Self‑Distillation Lets Few‑Step Diffusion Models Learn While Running

D-OPSD presents the first online self‑distillation framework for step‑distilled diffusion models, allowing them to continuously fine‑tune with only image‑text pairs, retain their fast few‑step sampling, and acquire new concepts, styles, or domain preferences without reward models.

Diffusion ModelsLoRASelf‑Distillation

0 likes · 10 min read

D-OPSD: On‑Policy Self‑Distillation Lets Few‑Step Diffusion Models Learn While Running

Machine Heart

May 11, 2026 · Artificial Intelligence

UniVidX Sets New SOTA on Multiple Video Tasks – A Unified Multimodal Framework Presented at SIGGRAPH 2026

UniVidX, a unified multimodal framework for video generation and understanding accepted at SIGGRAPH 2026, reformulates diverse video graphics tasks as conditional generation, achieving or surpassing state‑of‑the‑art performance while demonstrating strong data efficiency and cross‑domain generalization.

Data EfficiencyDiffusion ModelsSIGGRAPH 2026

0 likes · 10 min read

UniVidX Sets New SOTA on Multiple Video Tasks – A Unified Multimodal Framework Presented at SIGGRAPH 2026

Machine Heart

May 8, 2026 · Artificial Intelligence

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Omni2Sound tackles the long‑standing “generalist” dilemma of unified audio generation by constructing a high‑quality V‑T‑A dataset (SoundAtlas), employing a three‑stage progressive training pipeline, and using a simple Diffusion Transformer backbone, ultimately achieving state‑of‑the‑art performance on T2A, V2A and VT2A tasks and strong robustness on off‑screen scenarios.

Data AlignmentDiffusion ModelsMultimodal Learning

0 likes · 16 min read

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Machine Learning Algorithms & Natural Language Processing

Apr 27, 2026 · Artificial Intelligence

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

The paper introduces CFG‑Ctrl, a control‑theoretic redesign of classifier‑free diffusion guidance that treats the generation process as a dynamic system, achieving more stable and accurate text‑to‑image results across multiple model scales and evaluation metrics.

CFG-CtrlClassifier-Free GuidanceControl Theory

0 likes · 15 min read

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

Kuaishou Tech

Apr 24, 2026 · Artificial Intelligence

ICLR 2026: Kuaishou Tech Team’s Cutting‑Edge AI Research Highlights

This article reviews eight Kuaishou‑authored papers accepted at ICLR 2026, summarizing their problem statements, novel methods such as front‑door causal attribution, visual table retrieval, denoising rerankers, difficulty‑adaptive reasoning, diffusion code infilling, generative ordinal regression, multimodal video retrieval, e‑commerce dialogue benchmarks, and a new LLM creativity evaluator, together with reported experimental gains.

Diffusion ModelsICLR 2026Kuaishou

0 likes · 19 min read

ICLR 2026: Kuaishou Tech Team’s Cutting‑Edge AI Research Highlights

SuanNi

Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Diffusion ModelsEvaluation MetricsMultimodal AI

0 likes · 15 min read

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

AI Explorer

Apr 16, 2026 · Artificial Intelligence

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

This roundup highlights recent AI breakthroughs such as NVIDIA‑MIT’s Sol‑RL framework for faster diffusion model training, Peking University’s CPL++ visual localization improvement, DeepMind’s TIPSv2 for image recognition, Boston Dynamics Spot’s AI upgrade, Anthropic’s safety paper, a major MCP protocol vulnerability, OpenAI’s GPT‑5.4 release, and the shifting AI video landscape.

AIAI safetyDiffusion Models

0 likes · 5 min read

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

AI Explorer

Apr 16, 2026 · Artificial Intelligence

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

NVIDIA, Hong Kong University, and MIT introduced the Sol‑RL framework, which uses reinforcement‑learning‑guided sampling to cut diffusion model training time by several‑fold without sacrificing image quality, potentially lowering entry barriers for small teams and shifting the AIGC industry toward an efficiency‑driven competition.

AIGCDiffusion ModelsNVIDIA

0 likes · 6 min read

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

Machine Heart

Apr 16, 2026 · Artificial Intelligence

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

Sol‑RL, a framework from NVIDIA, Hong Kong University and MIT, integrates NVFP4 inference for large‑scale rollout exploration and BF16 precision for high‑fidelity regeneration, delivering up to 4.64× faster convergence at equivalent reward levels while preserving BF16 training fidelity across SANA, FLUX.1 and SD3.5‑L models.

BF16Diffusion ModelsFP4

0 likes · 9 min read

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

SuanNi

Apr 12, 2026 · Artificial Intelligence

How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Researchers from HKUST, CUHK and XiaoHongShu introduced TDM‑R1, a reinforcement‑learning‑based method that enables 4‑step diffusion image generation to surpass 80‑step models in speed, fidelity, and complex instruction adherence, as demonstrated on the GenEval benchmark and multiple quality metrics.

AI image synthesisBenchmarkingDiffusion Models

0 likes · 9 min read

How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Bighead's Algorithm Notes

Apr 9, 2026 · Artificial Intelligence

WSDM2026 Quantitative Research Papers: Summaries and Insights

This article presents concise summaries of three recent AI‑driven finance papers—Diffolio’s diffusion‑based risk‑aware portfolio optimization, STORM’s dual‑vector‑quantized VAE factor model, and AutoHypo‑Fin’s autonomous web‑mined hypothesis generation—highlighting their motivations, methods, and experimental gains.

AI for financeDiffusion ModelsVQ-VAE

0 likes · 9 min read

WSDM2026 Quantitative Research Papers: Summaries and Insights

HyperAI Super Neural

Apr 7, 2026 · Artificial Intelligence

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

MIT researchers introduce DRiffusion, a draft‑and‑refine parallel framework that uncovers intrinsic parallelism in diffusion models, delivering 1.4–3.7× speedup on three GPUs while preserving near‑lossless image quality across Stable Diffusion 2.1, SDXL and SD3 evaluated on MS‑COCO.

AI accelerationDRiffusionDiffusion Models

0 likes · 14 min read

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

vivo Internet Technology

Apr 1, 2026 · Artificial Intelligence

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

CVPR 2026Classifier-Free GuidanceDiffusion Models

0 likes · 7 min read

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

Bighead's Algorithm Notes

Mar 31, 2026 · Artificial Intelligence

Top AI-Driven Quantitative Finance Papers from AAAI 2026

This article curates and summarizes recent AI research papers presented at AAAI 2026 that advance quantitative finance, covering controllable market generation, LLM‑powered alpha factor mining, risk‑aware multi‑agent portfolio management, foundation models for market data, and reinforcement‑learning trading policies.

AIDiffusion ModelsFinancial Market Simulation

0 likes · 12 min read

Top AI-Driven Quantitative Finance Papers from AAAI 2026

PaperAgent

Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchDiffusion Models

0 likes · 7 min read

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

HyperAI Super Neural

Mar 25, 2026 · Artificial Intelligence

Low‑Barrier Deployment of NVIDIA’s Latest Physical AI Models for Humanoid Robots, Motion Generation, and Diffusion Fine‑Tuning

The article introduces NVIDIA’s Physical AI suite announced at GTC 2026—including Isaac GR00T, SOMA‑X, Kimodo, and FDFO—explains each model’s architecture and purpose, and provides one‑click online tutorials that let developers experiment with humanoid robotics, human‑body modeling, motion generation, and diffusion model fine‑tuning at minimal cost.

Diffusion ModelsEmbodied AIFDFO

0 likes · 8 min read

Low‑Barrier Deployment of NVIDIA’s Latest Physical AI Models for Humanoid Robots, Motion Generation, and Diffusion Fine‑Tuning

Bighead's Algorithm Notes

Mar 22, 2026 · Artificial Intelligence

DigMA: Controllable Generation of Financial Market Data – A Deep Dive

This article reviews the DigMA model, which uses a diffusion‑guided meta‑agent to generate high‑fidelity, controllable order‑flow data for financial markets, details its problem formulation, architecture, training on Chinese stock datasets, extensive experiments—including reinforcement‑learning‑based high‑frequency trading evaluation—and demonstrates its superior accuracy and ultra‑low latency generation.

Diffusion ModelsFinancial Market SimulationMeta‑Agent

0 likes · 16 min read

DigMA: Controllable Generation of Financial Market Data – A Deep Dive

Bighead's Algorithm Notes

Mar 13, 2026 · Artificial Intelligence

Paper Reading: STABLE – A Robust Portfolio Allocation Method Using Conditional Diffusion Estimates

The STABLE framework integrates a conditional diffusion generator with a Black‑Litterman mean‑variance optimizer to produce style‑aware return forecasts and risk‑aware portfolio weights, achieving up to a 122.9% Sharpe‑ratio boost, lower drawdowns, and a 15.7% MSE reduction across major equity markets.

Black-LittermanDiffusion Modelsconditional diffusion

0 likes · 17 min read

Paper Reading: STABLE – A Robust Portfolio Allocation Method Using Conditional Diffusion Estimates

AIWalker

Mar 10, 2026 · Artificial Intelligence

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

The paper introduces MIGM-Shortcut, a self‑supervised method that learns controlled latent‑state dynamics to bypass redundant bidirectional attention in Masked Image Generation Models, achieving over 4× speed‑up on state‑of‑the‑art multimodal diffusion models like Lumina‑DiMOO while preserving image quality.

AIDiffusion ModelsMIGM

0 likes · 8 min read

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

AI Explorer

Mar 9, 2026 · Industry Insights

AI Daily Highlights March 9 2026: Breakthrough Math Solver, Embodied AGI, Chip Hacks, and New Models

On March 9 2026, AI breakthroughs ranged from Claude Opus solving a 30‑year math problem and Tesla unveiling embodied AGI to Apple’s M4 chip limit being cracked, a new 30B open‑source model surpassing Gemini, and advances in diffusion and multimodal research, reflecting rapid industry evolution.

AIApple M4Claude Opus

0 likes · 6 min read

AI Daily Highlights March 9 2026: Breakthrough Math Solver, Embodied AGI, Chip Hacks, and New Models

SuanNi

Feb 23, 2026 · Artificial Intelligence

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

FireRed-Image-Edit, an open‑source instruction‑driven diffusion model, combines massive high‑quality data, a dual‑stream multimodal architecture, progressive training, and a comprehensive multi‑dimensional benchmark to achieve unprecedented pixel‑level control and human‑like editing performance across diverse visual tasks.

AIData EngineeringDiffusion Models

0 likes · 12 min read

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

Machine Learning Algorithms & Natural Language Processing

Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchDiffusion ModelsImageNet

0 likes · 6 min read

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

AI Frontier Lectures

Feb 3, 2026 · Artificial Intelligence

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

The Pixel Mean Flow (pMF) method eliminates multi‑step sampling and latent‑space encoding, generating high‑quality images in a single step and achieving state‑of‑the‑art FID scores on ImageNet while drastically reducing computational cost.

Diffusion ModelsImageNetperceptual loss

0 likes · 7 min read

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

Design Hub

Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationDiffusion ModelsFLUX.2

0 likes · 8 min read

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

Design Hub

Dec 22, 2025 · Artificial Intelligence

Open‑Source AI Photoshop: Alibaba’s Qwen‑Image‑Layered Enables One‑Click Smart Layering

Alibaba’s Qwen‑Image‑Layered model, now fully open‑source, automatically separates a single image into editable RGBA layers using diffusion, offering Photoshop‑level editing, prompt‑controlled layer counts, and deep decomposition, with applications ranging from PPT de‑construction to game asset extraction, while noting limitations on realistic photos.

AI image segmentationComfyUIDiffusion Models

0 likes · 8 min read

Open‑Source AI Photoshop: Alibaba’s Qwen‑Image‑Layered Enables One‑Click Smart Layering

Data Party THU

Dec 18, 2025 · Artificial Intelligence

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

AI video generation now turns textual prompts into high‑quality clips using diffusion models and transformer‑based architectures; this article explains the underlying mathematics, training objectives, spatio‑temporal encoding, breakthroughs like consistent motion and physical realism, and discusses the technology’s opportunities and inherent risks.

AI video generationDiffusion ModelsSpatio-temporal modeling

0 likes · 11 min read

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

Alibaba Cloud Developer

Dec 18, 2025 · Artificial Intelligence

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

This technical report details the end‑to‑end workflow for integrating an AIGC video generation module into a social app, covering requirement analysis, model and hardware selection, dataset construction, LoRA and full‑parameter training, multiple acceleration techniques such as Sage Attention, TeaCache, XDiT, gradient‑checkpointing offload, tiled VAE, and quantization, followed by extensive performance evaluation and metric‑based ranking of the final models.

AI video generationDiffusion ModelsLoRA fine-tuning

0 likes · 38 min read

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

Kuaishou Tech

Dec 3, 2025 · Artificial Intelligence

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

This article presents a novel paradigm—Latent Reward Model (LRM) and Latent Preference Optimization (LPO)—that repurposes diffusion models as noise‑aware latent reward models for step‑level preference optimization, addressing the shortcomings of pixel‑level reward models, introducing multi‑preference consistent filtering, and demonstrating significant performance and efficiency gains on benchmarks such as PickScore and T2I‑CompBench++.

AI alignmentDiffusion ModelsPreference Optimization

0 likes · 9 min read

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

HyperAI Super Neural

Nov 19, 2025 · Artificial Intelligence

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

The LocDiff framework introduces a spherical‑harmonics Dirac‑delta encoding and a conditional Siren‑UNet diffusion model that enables accurate worldwide image geolocation without relying on predefined grids or external image libraries, outperforming prior methods in precision, generalization, and computational efficiency.

AI researchDiffusion ModelsLocDiff

0 likes · 16 min read

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

Bighead's Algorithm Notes

Nov 15, 2025 · Artificial Intelligence

Quantitative Finance Paper Digest: Nov 8‑14 2025 Highlights

This article summarizes five recent arXiv papers that apply advanced AI techniques such as diffusion models, hierarchical attention, and stochastic differential equations to multivariate financial time‑series forecasting, portfolio selection, volatility surface generation, and gold‑futures alpha strategies, presenting their core methods and experimental results.

Diffusion Modelsequilibrium portfoliofinancial time series

0 likes · 10 min read

Quantitative Finance Paper Digest: Nov 8‑14 2025 Highlights

Kuaishou Tech

Nov 13, 2025 · Artificial Intelligence

Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

The paper identifies imbalanced concept distributions as the main obstacle to arbitrary concept‑combination in text‑to‑image/video generation, proposes the token‑level IMBA Distance and a lightweight IMBA Loss that adaptively re‑weights training tokens, and demonstrates through extensive experiments and a new Inert‑CompBench benchmark that this loss dramatically improves compositional ability without extra data.

Diffusion ModelsGenerative AIIMBA Loss

0 likes · 9 min read

Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

AI Frontier Lectures

Nov 4, 2025 · Artificial Intelligence

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

This article breaks down the ICCV 2025 paper "Zero‑Shot Image Anomaly Detection Using Generative Foundation Models," explaining how DiffPathV2 leverages diffusion model denoising trajectories, six‑dimensional score errors, and SSIM weighting to detect out‑of‑distribution images without any task‑specific training, achieving state‑of‑the‑art AUROC scores across multiple benchmarks.

AUROCDiffPathV2Diffusion Models

0 likes · 10 min read

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

HyperAI Super Neural

Oct 20, 2025 · Artificial Intelligence

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

The paper introduces ERDM, an enhanced rolling diffusion model that integrates progressive noise scheduling and time‑loss weighting from EDM, demonstrates superior CRPS scores on Navier‑Stokes and ERA5 mid‑term weather forecasts, and achieves comparable accuracy with far lower computational cost.

AIDiffusion ModelsERDM

0 likes · 14 min read

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

Data Party THU

Oct 15, 2025 · Artificial Intelligence

Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

This paper proposes a reinforcement‑learning framework that simultaneously ensures safety, sample efficiency, and robustness, applying a contextual‑bandit perspective to ranking/recommendation systems and text‑to‑image diffusion models, and introduces novel algorithms for safe deployment, variance‑reduced off‑policy estimation, and a LOOP method for generative RL.

Diffusion ModelsRobustnessSafety

0 likes · 5 min read

Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

AI Algorithm Path

Oct 15, 2025 · Artificial Intelligence

Building a Flow Matching Model from Scratch: Theory Explained

This article walks through the theory behind flow‑matching generative models, contrasting them with diffusion models, detailing the velocity‑field formulation, training objective, and sampling procedure, and includes visual illustrations of the core concepts.

Diffusion ModelsODEflow matching

0 likes · 8 min read

Building a Flow Matching Model from Scratch: Theory Explained

Data Party THU

Oct 13, 2025 · Artificial Intelligence

How BranchGRPO Accelerates and Stabilizes Diffusion Model Alignment

BranchGRPO introduces a tree‑structured branching, reward‑fusion, and lightweight pruning framework that dramatically speeds up diffusion and flow model training while delivering denser, more stable reward signals, achieving up to five‑fold faster convergence and higher alignment scores on image and video generation benchmarks.

BranchGRPODiffusion ModelsEfficiency

0 likes · 10 min read

How BranchGRPO Accelerates and Stabilizes Diffusion Model Alignment

AI Algorithm Path

Oct 12, 2025 · Artificial Intelligence

Flow Matching vs Diffusion Models: Key Differences and Connections

This technical article provides a comprehensive comparison of diffusion models and flow matching, covering their intuitive explanations, underlying mathematics, training objectives, sampling efficiency, theoretical guarantees, practical examples, and code implementations to illustrate how each generative approach works.

Diffusion ModelsGenerative AIflow matching

0 likes · 12 min read

Flow Matching vs Diffusion Models: Key Differences and Connections

Data Party THU

Oct 6, 2025 · Artificial Intelligence

Why Data, Not Architecture, Drives Locality in Diffusion Models

A recent MIT‑Toyota study shows that the locality observed in image diffusion models emerges from the statistical structure of training data rather than from architectural biases, and a simple linear denoiser can replicate this behavior, reshaping how we think about model design.

Data StatisticsDiffusion ModelsU-Net

0 likes · 10 min read

Why Data, Not Architecture, Drives Locality in Diffusion Models

Amap Tech

Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingDiffusion ModelsMultimodal AI

0 likes · 11 min read

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

AI2ML AI to Machine Learning

Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Diffusion ModelsMultimodal AIStable Diffusion

0 likes · 7 min read

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

Bighead's Algorithm Notes

Sep 27, 2025 · Artificial Intelligence

Weekly Time-Series Paper Digest (Sep 20‑26, 2025)

This digest summarizes three recent arXiv papers that propose novel diffusion‑based generation, a channel‑independent convolution for multivariate forecasting, and a style‑guided diffusion framework, each demonstrating improved realism, coherence, and diversity of synthetic time‑series data through extensive experiments.

DS-DiffusionDiffusion ModelsIConv

0 likes · 8 min read

Weekly Time-Series Paper Digest (Sep 20‑26, 2025)

Kuaishou Large Model

Sep 24, 2025 · Artificial Intelligence

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

The article explains the core challenges of real‑time bidding, reviews Kuaishou's evolution from PID to MPC to reinforcement learning, and introduces generative reinforcement‑learning methods (GAVE and CBD) that combine decision transformers or diffusion models with value‑guided exploration and score‑based RTG, achieving significant offline and online performance gains.

Diffusion Modelsadvertising algorithmsgenerative reinforcement learning

0 likes · 15 min read

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

AIWalker

Sep 17, 2025 · Artificial Intelligence

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

InfGen introduces a resolution‑agnostic generation paradigm that replaces the VAE decoder in diffusion models, allowing any‑size image synthesis with up to ten‑fold speed gains, achieving 4K outputs in under 7 seconds while improving visual quality.

Diffusion ModelsInfGenarbitrary resolution

0 likes · 15 min read

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

Bighead's Algorithm Notes

Sep 12, 2025 · Artificial Intelligence

AI for Finance: Quantum Asset Clustering, Causal Market Troughs, Multimodal Forecasting, Diffusion SDEs

This article summarizes four recent AI‑driven finance papers: a quantum‑annealing asset clustering algorithm, a causal machine‑learning model for predicting market troughs, a multimodal large‑model approach to financial time‑series forecasting, and a diffusion‑model method for generating stochastic‑differential‑equation sample paths.

Asset ClusteringCausal MLDiffusion Models

0 likes · 7 min read

AI for Finance: Quantum Asset Clustering, Causal Market Troughs, Multimodal Forecasting, Diffusion SDEs

Sohu Smart Platform Tech Team

Sep 12, 2025 · Artificial Intelligence

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

This article systematically explores the technical evolution, core principles, and emerging innovations of AI‑generated video, covering generation methods, GAN and diffusion models, transformer‑based DiT architectures, efficiency‑boosting NCR, audio‑visual V2A integration, and real‑world applications across media, education, and commerce.

AI video generationDiffusion ModelsGaN

0 likes · 25 min read

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

Bighead's Algorithm Notes

Sep 5, 2025 · Artificial Intelligence

Weekly Quantitative Finance Paper Digest (Aug 30 – Sep 5, 2025)

This digest reviews four recent AI‑driven finance papers: a robust MCVaR portfolio optimizer with ellipsoidal support and RKHS uncertainty, a PPO‑based adaptive weighting system for LLM‑generated alphas, an empirical comparison of price‑based, GICS‑based, and LLM‑embedding stock clustering, and a diffusion‑model approach that generates future financial chart images from current charts and text prompts.

Diffusion Modelslarge language modelsportfolio optimization

0 likes · 9 min read

Weekly Quantitative Finance Paper Digest (Aug 30 – Sep 5, 2025)

Data Party THU

Sep 3, 2025 · Artificial Intelligence

Exploring Multimodal Generative AI: A Tsinghua Tutorial at IJCAI 2025

This article introduces a 1.5‑hour tutorial presented by Tsinghua researchers at IJCAI 2025, covering the latest advances in multimodal generative AI, including multimodal large language models, diffusion models, post‑training generalization techniques, and unified understanding‑generation frameworks.

Diffusion ModelsIJCAI 2025Multimodal AI

0 likes · 5 min read

Exploring Multimodal Generative AI: A Tsinghua Tutorial at IJCAI 2025

AIWalker

Aug 19, 2025 · Artificial Intelligence

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

DynamicFace introduces a diffusion‑based framework that explicitly decouples identity, pose, expression, illumination and background using composable 3D facial priors, achieving superior identity preservation, motion consistency and visual fidelity in both image and video face‑swapping tasks.

3D facial priorsDiffusion Modelscontrollable generation

0 likes · 13 min read

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

Xiaohongshu Tech REDtech

Aug 19, 2025 · Artificial Intelligence

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.

AI accelerationDiffusion ModelsStyle Transfer

0 likes · 14 min read

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

Data Party THU

Aug 15, 2025 · Artificial Intelligence

What’s Next for Visual Reinforcement Learning? A Comprehensive 2024‑2025 Survey

This article provides a critical, up‑to‑date overview of visual reinforcement learning, formalizes the problem, traces policy‑optimization evolution, categorizes over 200 recent works into four pillars, analyzes algorithms, reward design, benchmarks, and highlights open challenges and future research directions.

Diffusion ModelsMultimodal AIRLHF

0 likes · 7 min read

What’s Next for Visual Reinforcement Learning? A Comprehensive 2024‑2025 Survey

Baidu Geek Talk

Aug 11, 2025 · Artificial Intelligence

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

FLUX-Lightning, introduced by PaddleMIX, combines phased consistency distillation, adversarial learning, distribution‑matching distillation, and reflow loss to reduce diffusion model inference to just four steps while preserving image quality, and leverages the CINN compiler to achieve over 30% speed gains on A800 GPUs, surpassing existing SOTA acceleration methods.

AI inferenceCINNDiffusion Models

0 likes · 21 min read

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

Data Party THU

Aug 9, 2025 · Artificial Intelligence

How SADA Boosts Diffusion Model Sampling Speed by Up to 1.8× Without Losing Quality

The paper introduces SADA (Stability‑guided Adaptive Diffusion Acceleration), a novel paradigm that dynamically allocates sparsity per token using a unified stability criterion, enabling efficient ODE‑based sampling for diffusion and flow‑matching models, achieving up to 1.8× speedup with negligible fidelity loss across SD‑2, SDXL, Flux, ControlNet and MusicLDM.

Diffusion ModelsGenerative AIODE

0 likes · 5 min read

How SADA Boosts Diffusion Model Sampling Speed by Up to 1.8× Without Losing Quality

AI Frontier Lectures

Jul 30, 2025 · Artificial Intelligence

DualReal: Seamless Identity and Motion Customization for Video Generation

DualReal introduces a novel adaptive joint training framework that simultaneously customizes subject identity and motion dynamics in video generation, overcoming the conflicts of traditional isolated approaches by using a dual-domain perception adapter and stage-fusion controller, achieving up to 31.8% improvement on CLIP‑I and DINO‑I metrics.

Diffusion Modelsdual-domain adaptationidentity preservation

0 likes · 13 min read

DualReal: Seamless Identity and Motion Customization for Video Generation

Alimama Tech

Jul 23, 2025 · Artificial Intelligence

How Differentiable Solver Search Accelerates Diffusion Model Sampling

This article presents a differentiable solver search method that quickly finds high‑quality sampling paths for diffusion models, demonstrating significant FID improvements across Rectified‑Flow, DDPM/VP, and text‑to‑image models while requiring no model parameter changes.

AIDiffusion Modelsdifferentiable solver

0 likes · 20 min read

How Differentiable Solver Search Accelerates Diffusion Model Sampling

Kuaishou Tech

Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchDiffusion Modelsautoregressive transformer

0 likes · 11 min read

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

AI Frontier Lectures

Jul 13, 2025 · Artificial Intelligence

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

HarmoniCa, a new feature‑caching framework co‑designed by HKUST, Beihang University, and SenseTime, tackles diffusion model inference bottlenecks by aligning training and inference through Step‑Wise Denoising Training and an Image Error Proxy Objective, achieving up to 2× speedup while preserving image quality.

Diffusion ModelsPerformance Accelerationfeature caching

0 likes · 9 min read

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsVAEViT³

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Diffusion ModelsVision Transformerimage generation

0 likes · 10 min read

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

Amap Tech

Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Diffusion Modelsbenchmarkimage restoration

0 likes · 14 min read

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

Amap Tech

Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchDiffusion Modelsimage restoration

0 likes · 14 min read

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

Kuaishou Large Model

Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsEvolutionary Searchimage generation

0 likes · 8 min read

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

Tencent Technical Engineering

Jul 3, 2025 · Artificial Intelligence

Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Tencent’s TEG team secured first place in the NTIRE 2025 UGC Video Enhancement competition by introducing a progressive, three‑stage AI framework that decomposes enhancement tasks into expert models for color correction, denoising, and temporal stability, incorporates advanced loss functions, extensive hardware‑level optimizations, INT8 quantization techniques, and outlines future diffusion‑based generative enhancements.

AIDiffusion ModelsQuantization

0 likes · 17 min read

Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Kuaishou Tech

Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsEvolutionary Search

0 likes · 8 min read

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

Kuaishou Large Model

Jun 11, 2025 · Artificial Intelligence

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

CVPR 2025 in Nashville will feature 12 Kuaishou papers spanning large‑scale video datasets, quality assessment, 3D/4D reconstruction, controllable generation, diffusion scaling laws, multimodal simulation, and novel benchmarks, highlighting the company's cutting‑edge contributions to video AI research.

Diffusion Modelslarge-scale datasets

0 likes · 21 min read

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

DataFunTalk

Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsDiffusion Models

0 likes · 70 min read

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

Amap Tech

Jun 5, 2025 · Artificial Intelligence

How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion

MVPainter introduces a fully open‑source pipeline that generates high‑quality, PBR‑compatible 3D textures from a single reference image and a white model by leveraging multi‑view diffusion, geometric control, and a human‑aligned evaluation framework, dramatically improving texture fidelity, alignment, and detail.

3D texture generationAIDiffusion Models

0 likes · 10 min read

How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion

AntTech

Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Diffusion ModelsMultimodal AIgenerative modeling

0 likes · 10 min read

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

AI Frontier Lectures

May 23, 2025 · Artificial Intelligence

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

SuperEdit introduces rectified instruction generation and contrastive supervision to fix noisy supervision in instruction‑based image editing, achieving up to 9.19% performance gains on Real‑Edit benchmarks without extra model parameters or pre‑training, and releases all data and code publicly.

Diffusion Modelsimage editingvisual-language-models

0 likes · 15 min read

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

AI Frontier Lectures

May 19, 2025 · Artificial Intelligence

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

SuperEdit introduces rectified instruction generation and contrastive supervision to fix noisy training signals in instruction‑based image editing, achieving up to 9.19% performance gains without extra parameters or pre‑training, as demonstrated on the Real‑Edit benchmark.

Diffusion Modelsimage editingsupervision

0 likes · 13 min read

AIWalker

May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Diffusion Modelsautoregressive modelingcausal attention

0 likes · 16 min read

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

AI Algorithm Path

May 15, 2025 · Artificial Intelligence

Understanding Diffusion Models: Core Principles Explained

This article explains the fundamental principles of diffusion models, using physics and machine‑learning analogies to describe forward and reverse diffusion, the role of Gaussian noise, iteration trade‑offs, U‑Net architecture, and shared‑weight training for image generation.

Diffusion ModelsGenerative AIU-Net

0 likes · 8 min read

Understanding Diffusion Models: Core Principles Explained

AI Frontier Lectures

May 13, 2025 · Artificial Intelligence

How Diffusion Policy is Transforming Vision‑Based Robot Motion Learning

This article provides a comprehensive, step‑by‑step analysis of Diffusion Policy for robot visuomotor control, covering its motivation, task characteristics, model design, dataset preparation, training pipeline, inference procedure, experimental results, and open research questions.

Diffusion Modelsmachine learningpolicy learning

0 likes · 63 min read

How Diffusion Policy is Transforming Vision‑Based Robot Motion Learning

AIWalker

May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

Diffusion ModelsPixelHackerSOTA

0 likes · 16 min read

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

AIWalker

May 11, 2025 · Artificial Intelligence

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

This comprehensive survey reviews the rapid progress of multimodal understanding and text‑to‑image generation models, categorises existing unified architectures into diffusion‑based, autoregressive, and hybrid paradigms, analyses their tokenisation strategies, datasets and benchmarks, and highlights current challenges and future research directions.

Autoregressive ModelsDiffusion ModelsMultimodal AI

0 likes · 64 min read

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

AI Frontier Lectures

Apr 28, 2025 · Artificial Intelligence

How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos

DP-Recon leverages generative diffusion priors and a visibility‑guided SDS loss to achieve high‑fidelity, compositional 3D scene reconstruction from extremely sparse images, delivering superior geometry, texture, and text‑driven editing capabilities demonstrated on benchmark datasets and real‑world indoor scenarios.

3D reconstructionAIDiffusion Models

0 likes · 10 min read

How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos

AI Frontier Lectures

Apr 18, 2025 · Artificial Intelligence

DiffDenoise: Conditional Diffusion Transforms Medical Image Denoising

DiffDenoise introduces a three‑stage self‑supervised pipeline that combines a blind‑spot network, conditional diffusion modeling, and stabilized reverse diffusion sampling to dramatically improve medical image denoising performance on both synthetic and real datasets, while also offering a fast distilled version for practical deployment.

Diffusion ModelsImage processingmedical imaging

0 likes · 10 min read

DiffDenoise: Conditional Diffusion Transforms Medical Image Denoising

Alimama Tech

Apr 17, 2025 · Artificial Intelligence

PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering

PosterMaker leverages a ControlNet‑based TextRenderNet with character‑level visual features and a reward‑driven foreground‑extension detector to generate high‑quality product posters that accurately render Chinese text (over 90% sentence accuracy) while preserving product fidelity, and is already deployed in Alibaba’s AI creative tool.

Diffusion ModelsPoster Generationcharacter-level features

0 likes · 18 min read

PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering

AIWalker

Apr 16, 2025 · Artificial Intelligence

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

FloED introduces a dual‑branch, flow‑guided diffusion framework that dramatically improves spatio‑temporal consistency and computational efficiency for video restoration, outperforming existing text‑guided diffusion methods on both object removal and background repair benchmarks.

Diffusion ModelsEfficiencyFloED

0 likes · 16 min read

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

AIWalker

Apr 14, 2025 · Artificial Intelligence

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

FlexIP introduces a dual‑adapter architecture and a dynamic weight‑gating mechanism that decouple identity preservation from personalized editing, allowing continuous control over image generation and outperforming prior SOTA methods in both fidelity and flexibility.

AIDiffusion Modelsdual-adapter

0 likes · 16 min read

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

AIWalker

Apr 10, 2025 · Artificial Intelligence

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

DCEdit introduces a precise semantic localization strategy and a dual-level control mechanism for text‑guided image editing, delivering superior background preservation and editing quality, as demonstrated on the new RW‑800 benchmark and extensive comparisons with state‑of‑the‑art diffusion models.

AIDiffusion Modelsbenchmark

0 likes · 16 min read

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

AIWalker

Apr 7, 2025 · Artificial Intelligence

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

TurboFill introduces a fast image‑inpainting model that trains a repair adapter on a few‑step text‑to‑image diffusion backbone, achieving state‑of‑the‑art results with only four diffusion steps while dramatically reducing computational cost.

Diffusion ModelsTurboFillcomputer vision

0 likes · 17 min read

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

DaTaobao Tech

Apr 7, 2025 · Artificial Intelligence

Flow Matching for Generative Modeling

Flow Matching reformulates generative modeling by learning a time‑dependent vector field that deterministically transports Gaussian noise to data, using a neural network trained with an analytically derived L2 loss, yielding simpler training, faster convergence, and deterministic sampling that matches or exceeds diffusion model quality.

AIDiffusion Modelscontinuous normalizing flow

0 likes · 13 min read

AIWalker

Mar 27, 2025 · Artificial Intelligence

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

MagicColor introduces a novel multi‑instance sketch‑coloring framework that uses a two‑stage self‑play training strategy, instance guidance, and edge‑aware pixel‑level color matching to automatically produce high‑quality, consistent colors for multiple line‑art instances, outperforming prior GAN and diffusion‑based methods.

AIDiffusion ModelsMulti-Instance

0 likes · 16 min read

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

Tencent Cloud Developer

Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

Diffusion Modelsadversarial post-trainingadversarial training

0 likes · 19 min read

Knowledge Distillation in Diffusion Models: Techniques and Applications

AIWalker

Mar 23, 2025 · Artificial Intelligence

One-Click Removal & Seamless Integration: CycleFlow + Diffusion Prior Power OmniPaint

OmniPaint introduces a unified diffusion‑based framework that achieves physically consistent object removal and insertion by leveraging a pre‑trained FLUX‑1 diffusion prior, a progressive CycleFlow training pipeline, and a novel reference‑free CFD metric for high‑fidelity image editing.

CFD MetricCycleFlowDiffusion Models

0 likes · 17 min read

One-Click Removal & Seamless Integration: CycleFlow + Diffusion Prior Power OmniPaint

AIWalker

Mar 18, 2025 · Artificial Intelligence

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

ImageRAG introduces a retrieval‑augmented generation framework that dynamically fetches relevant images to guide diffusion models, dramatically improving the synthesis of rare and fine‑grained concepts across multiple text‑to‑image systems, as demonstrated by extensive quantitative and user studies.

AI generationDiffusion ModelsImageRAG

0 likes · 17 min read

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

AI Frontier Lectures

Mar 17, 2025 · Artificial Intelligence

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

The article analyzes Mercury Coder, a diffusion‑based language model that generates text and code in parallel, compares its speed and quality against traditional autoregressive LLMs like GPT‑4o‑mini using a ball‑collision benchmark, and discusses the underlying score‑entropy training, current limitations, and future multimodal potential.

AI performanceDiffusion ModelsMercury

0 likes · 8 min read

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture