Tagged articles

diffusion

36 articles · Page 1 of 1

Jun 22, 2026 · Artificial Intelligence

Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance

MiniT2I, a minimalist pixel-space text-to-image model that discards VAE, AdaLN, and private data, achieves 0.87 GenEval and 84.2 DPG-Bench scores with only 258 M parameters, demonstrating that a stripped-down architecture and public data can outperform larger, more complex systems.

AI researchMiniT2ITransformer

0 likes · 8 min read

Why Dropping VAE and Private Data Boosts Text-to-Image Generation Performance

SuanNi

Jun 4, 2026 · Artificial Intelligence

Bernini: An Open‑Source AI Model that Masterfully Handles Diverse Video Editing Tasks

Bernini combines a multimodal large language model with a diffusion renderer, uses a semantic planner‑renderer architecture, segment‑aware 3D position encoding and chain‑of‑thought reasoning, and achieves state‑of‑the‑art results on a 300‑case benchmark that outperforms closed‑source competitors.

BenchmarkBerniniLLM

0 likes · 11 min read

Bernini: An Open‑Source AI Model that Masterfully Handles Diverse Video Editing Tasks

Machine Learning Algorithms & Natural Language Processing

May 25, 2026 · Artificial Intelligence

VeRL-Omni: A Universal RL Post‑Training Framework for Diffusion and Multimodal Generation Models

VeRL-Omni introduces a universal reinforcement‑learning post‑training framework that extends the verl and vLLM‑Omni stacks to support diffusion transformers, hybrid AR‑DiT, and unified understanding‑generation models, offering high‑throughput multimodal rollout, flexible reward engines, modular trainers, and broad hardware compatibility.

FlowGRPOMultimodal GenerationRL

0 likes · 9 min read

VeRL-Omni: A Universal RL Post‑Training Framework for Diffusion and Multimodal Generation Models

Machine Heart

May 22, 2026 · Artificial Intelligence

Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation

Nvidia introduces a tri‑mode large language model that can switch among autoregressive, diffusion and self‑speculation decoding, delivering up to four times higher token throughput, achieving state‑of‑the‑art accuracy on benchmarks, and showing significant speed gains on DGX Spark, RTX 6000 Pro and GB200 hardware.

LLMNVIDIAToken throughput

0 likes · 8 min read

Nvidia’s First Tri‑Mode LLM Boosts Token Throughput 4× and Promises Second‑Second Long‑Text Generation

Machine Heart

May 9, 2026 · Artificial Intelligence

BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge

The BARD-VL framework bridges pretrained autoregressive vision‑language models to diffusion‑based VLMs, preserving or surpassing original performance while boosting decoding throughput up to three times, through progressive block merging, stage‑wise diffusion distillation, and engineering optimizations validated on multiple benchmarks.

BARD-VLBenchmarkEfficiency

0 likes · 9 min read

BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge

Machine Learning Algorithms & Natural Language Processing

May 4, 2026 · Artificial Intelligence

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

A review of CVPR 2026 papers shows a shift in visual generative AI from incremental performance gains within established frameworks to a systematic rewrite of default modeling assumptions, covering new guidance mechanisms, video generation architectures, direct image prediction, fine‑grained motion control, and dense semantic correspondence.

Generative AIdiffusionhuman motion

0 likes · 13 min read

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

DaTaobao Tech

Apr 22, 2026 · Artificial Intelligence

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

MNN‑Sana‑Edit‑V2, a collaborative effort between Taobao’s Meta team and Hangzhou University, combines a frozen Qwen3‑0.6B LLM, Learnable Query, Connector, Linear DiT and Deep Compression Autoencoder with 4/8‑bit quantization to run fully on mobile devices, delivering 512×512 comic‑style conversions in about 15 seconds—2.5× faster than cloud alternatives—while providing open‑source code, detailed training stages, and extensive performance benchmarks.

Edge deploymentModel Quantizationdiffusion

0 likes · 13 min read

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

vivo Internet Technology

Mar 18, 2026 · Artificial Intelligence

How Ada-RefSR Eliminates Hallucinations in Single‑Step Diffusion Super‑Resolution

This article presents Ada-RefSR, a novel single‑step diffusion‑based reference super‑resolution framework that introduces a "Trust but Verify" paradigm, adaptive implicit correlation gating, and lightweight architecture to robustly suppress hallucinations and achieve state‑of‑the‑art performance on multiple benchmarks, while being suitable for mobile deployment.

Ada-RefSRICLR2026diffusion

0 likes · 10 min read

How Ada-RefSR Eliminates Hallucinations in Single‑Step Diffusion Super‑Resolution

AIWalker

Mar 4, 2026 · Artificial Intelligence

Drifting Models Enable One‑Step Generation, Shattering Speed Records

The paper introduces Drifting Models, a new generative paradigm that moves the distribution evolution to the training phase, achieving true one‑step (1‑NFE) generation with state‑of‑the‑art ImageNet FID scores of 1.54 in latent space and 1.61 in pixel space, while eliminating the need for distillation or classifier‑free guidance.

Drifting ModelsImageNetOne-step Generation

0 likes · 24 min read

Drifting Models Enable One‑Step Generation, Shattering Speed Records

Data Party THU

Feb 15, 2026 · Artificial Intelligence

Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing

FireRed-Image-Edit, the latest open‑source image‑editing model from the Xiaohongshu Super Intelligence team, outperforms existing benchmarks with superior instruction understanding, ID preservation and efficient architecture, thanks to its RedEdit Bench evaluation suite, a three‑stage training pipeline and a scalable data‑engine.

AI Image EditingFireRed-Image-EditOpen-source

0 likes · 8 min read

Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing

Baobao Algorithm Notes

Jan 14, 2026 · Artificial Intelligence

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

GLM-Image, a Chinese text‑to‑image model trained end‑to‑end on Huawei Ascend 800T A2 NPUs, combines an autoregressive decoder with a diffusion encoder, supports resolutions up to 2048×2048, and offers open‑source code, API access, and detailed prompts that demonstrate its strong layout and typography capabilities.

GLM-ImageHuawei Ascenddiffusion

0 likes · 12 min read

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

Meituan Technology Team

Jan 8, 2026 · Artificial Intelligence

Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More

This article curates eight AAAI 2026 papers authored by the Meituan research team, covering verifiable stepwise rewards for LLM reasoning, annealing strategies in large‑scale training, process reward models, competence‑difficulty sampling, high‑fidelity visual text rendering, counterfactual fusion, compress‑then‑rank reranking, and cross‑modal quantization for generative recommendation, with direct PDF links for each work.

AAAI2026CounterfactualLLM

0 likes · 14 min read

Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More

Kuaishou Tech

Jan 8, 2026 · Artificial Intelligence

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou secured 12 papers at AAAI 2026, covering advances in search and recommendation systems, multi‑camera video generation, multimodal understanding, generative model fundamentals, video large language models, experimental design, and LLM latent‑space reasoning, with three papers highlighted as oral presentations.

AILLMdiffusion

0 likes · 22 min read

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou Tech

Sep 17, 2025 · Artificial Intelligence

How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation

The MIDAS framework introduced by the Kling Team combines autoregressive video generation with a lightweight diffusion denoising head to deliver real‑time, high‑quality digital‑human synthesis under multimodal control, achieving sub‑500 ms latency, 64× compression, and robust performance across multilingual dialogue, singing, and interactive world modeling tasks.

AIMultimodal Generationautoregressive

0 likes · 6 min read

How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation

AI Frontier Lectures

Jul 30, 2025 · Artificial Intelligence

How MetaQuery Bridges MLLMs and Diffusion Models for Superior Multimodal Generation

MetaQuery introduces learnable queries that connect a frozen multimodal LLM with diffusion models, enabling knowledge‑enhanced image generation, reconstruction, and editing while preserving state‑of‑the‑art multimodal understanding, and achieves new SOTA results across multiple benchmarks.

AI researchMLLMMetaQuery

0 likes · 18 min read

How MetaQuery Bridges MLLMs and Diffusion Models for Superior Multimodal Generation

AI Frontier Lectures

Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAautoregressive

0 likes · 16 min read

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

AI Algorithm Path

Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchLanguage ModelsText Generation

0 likes · 17 min read

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

AIWalker

Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Efficient TrainingInference ScalingModel Scaling

0 likes · 17 min read

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

DaTaobao Tech

Mar 12, 2025 · Artificial Intelligence

Multimodal Automatic Layout Generation for E-commerce

The project develops a multimodal automatic layout generation system for e‑commerce by fine‑tuning the qwen‑vl‑7b vision‑language model with LoRA on poster and Taobao image‑layout data, employing diffusion‑based image generation and coordinate‑prediction methods to produce structured layouts that power poster, marketing image, and video‑cover creation with over 90% adoption, while exploring multi‑image, style‑aware, and iterative refinement extensions.

LLMMultimodal AIdiffusion

0 likes · 12 min read

Multimodal Automatic Layout Generation for E-commerce

AIWalker

Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingMultimodal

0 likes · 20 min read

Transfusion: A Single Model for Unified Image Generation and Understanding

AIWalker

Feb 13, 2025 · Artificial Intelligence

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

FlashVideo introduces a two‑stage framework that first generates low‑resolution videos with strong prompt fidelity and then uses flow‑matching ODE trajectories to upscale to 4K quality in just four function evaluations, achieving top VBench‑Long scores while cutting generation time by up to five‑fold.

AIEfficiencyFlashVideo

0 likes · 26 min read

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

AIWalker

Feb 6, 2025 · Artificial Intelligence

FluxSR: The First 12B‑Parameter Single‑Step Diffusion Model for Real‑World Super‑Resolution

FluxSR introduces a novel single‑step diffusion approach for real‑world image super‑resolution built on the 12‑billion‑parameter FLUX.1‑dev model, employing Flow‑Trajectory Distillation, TV‑LPIPS and attention‑diversity losses to achieve high fidelity, reduced artifacts, and lower memory and compute costs.

Flow Distillationdiffusionimage restoration

0 likes · 16 min read

FluxSR: The First 12B‑Parameter Single‑Step Diffusion Model for Real‑World Super‑Resolution

AIWalker

Feb 5, 2025 · Artificial Intelligence

How SANA 1.5’s Efficient Linear Diffusion Transformer Sets a New SOTA in Text‑to‑Image Generation

The paper introduces SANA 1.5, an efficient linear diffusion transformer that scales training and inference compute via model growth, depth‑wise pruning, and inference‑time scaling, achieving a GenEval score of 0.80 and matching larger models while using far less resources.

AISANAdiffusion

0 likes · 23 min read

How SANA 1.5’s Efficient Linear Diffusion Transformer Sets a New SOTA in Text‑to‑Image Generation

Alimama Tech

Dec 4, 2024 · Artificial Intelligence

AIGB: Generative Auto‑Bidding via Diffusion Modeling

AIGB, introduced by Alibaba Mama in 2023, reframes large‑scale ad‑auction auto‑bidding as a generative sequence task using diffusion models, achieving up to 5 % GMV gains, improved stability and interpretability, and is now commercialized, open‑sourced, and featured in a NeurIPS‑endorsed competition.

AIauto-biddingdiffusion

0 likes · 12 min read

AIGB: Generative Auto‑Bidding via Diffusion Modeling

NewBeeNLP

Dec 2, 2024 · Artificial Intelligence

What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?

This article surveys current unified generation-and-understanding multimodal large-model architectures, compares LLM-centric and LLM-plus-diffusion designs, extracts common insights, details large-scale training tricks from models like Emu3, Chameleon and Janus, and outlines open research directions for visual encoders.

Large Language ModelsMultimodaldiffusion

0 likes · 5 min read

What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?

DaTaobao Tech

Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNQuantization

0 likes · 15 min read

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

DataFunTalk

May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCMultimodalOPPO

0 likes · 18 min read

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

DataFunSummit

May 6, 2024 · Artificial Intelligence

Advances, Model Types, and Open Challenges of AI‑Generated Content (AIGC) with XiaoBu’s Image Generation Progress

This article reviews the definition, key metrics, and major model families of AI‑generated content, details XiaoBu’s recent breakthroughs in image generation, and discusses open research problems such as evaluation gaps, transformer limitations, and the need for richer multimodal intelligence representations.

AIGCGaNPrompt engineering

0 likes · 14 min read

Advances, Model Types, and Open Challenges of AI‑Generated Content (AIGC) with XiaoBu’s Image Generation Progress

JD Cloud Developers

Apr 25, 2024 · Artificial Intelligence

How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation

This article presents JD Advertising's 2023 innovations that combine relation‑aware diffusion models, category‑aware background generation, and planning‑and‑rendering pipelines to automatically produce high‑quality, scalable, and personalized e‑commerce ad posters, addressing efficiency, cost, and creative limitations of manual design.

AIAdvertisingdiffusion

0 likes · 18 min read

How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation

Baobao Algorithm Notes

Mar 22, 2024 · Artificial Intelligence

Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model

This article provides a detailed, step‑by‑step technical analysis of OpenAI's Sora text‑to‑video system, exploring its overall architecture, visual encoder‑decoder choices, Spacetime Latent Patch design, transformer‑based diffusion model, training strategies, and long‑time consistency mechanisms while referencing relevant research papers and open‑source techniques.

AISoradiffusion

0 likes · 50 min read

Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model

Volcano Engine Developer Services

Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation

0 likes · 8 min read

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

Sohu Tech Products

Mar 6, 2024 · Artificial Intelligence

Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications

OpenAI’s Sora video model unifies image and video data into latent spacetime patches via a VAE, trains on original resolutions with GPT‑4‑expanded captions, employs a Diffusion Transformer backbone for patch‑wise denoising, and demonstrates 3D‑consistent, long‑term world‑model capabilities that hint at a unified computer‑vision paradigm and steps toward AGI.

AI researchOpenAI SoraTransformer

0 likes · 9 min read

Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications

Architecture and Beyond

Feb 8, 2024 · Artificial Intelligence

Mastering AIGC: 15 Essential AI Terms and Key Technologies Explained

This article provides a comprehensive overview of core AI concepts, from basic definitions of AI, AGI, and AIGC to detailed explanations of GPUs, major generative models, leading AI products, and influential companies, helping readers quickly grasp the landscape of AI-generated content.

AIAIGCCLIP

0 likes · 24 min read

Mastering AIGC: 15 Essential AI Terms and Key Technologies Explained

Model Perspective

Nov 19, 2023 · Fundamentals

How Diffusion Models Explain Everyday Phenomena and Environmental Risks

This article introduces the fundamental concepts and mathematical description of diffusion, explores its wide-ranging applications from daily life to environmental engineering, and demonstrates its use through a detailed ink‑in‑water example and a lake‑spill case study.

Fick's lawPhysicsdiffusion

0 likes · 10 min read

How Diffusion Models Explain Everyday Phenomena and Environmental Risks

Alibaba Cloud Big Data AI Platform

May 26, 2023 · Artificial Intelligence

Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide

This article introduces the upgraded PAI‑Diffusion Chinese models, highlighting major improvements in image quality and style diversity, detailing lightweight fine‑tuning methods such as LoRA and Textual Inversion, showcasing controllable editing, scenario‑specific customization, and providing step‑by‑step usage instructions on popular platforms.

AILoRATextual Inversion

0 likes · 14 min read

Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide

Alibaba Cloud Developer

Apr 14, 2023 · Artificial Intelligence

Why Large Models Are Revolutionizing AI: From Foundations to AIGC

This article explores the concept and evolution of large foundation models, their transformative impact on AI-generated content, the underlying technologies such as transformers, diffusion, and CLIP, and discusses the challenges, emerging abilities, and future prospects of these models across multiple modalities.

AIGCFoundation ModelsGPT

0 likes · 32 min read

Why Large Models Are Revolutionizing AI: From Foundations to AIGC