Tagged articles
32 articles
Page 1 of 1
Machine Heart
Machine Heart
May 9, 2026 · Artificial Intelligence

BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge

The BARD-VL framework bridges pretrained autoregressive vision‑language models to diffusion‑based VLMs, preserving or surpassing original performance while boosting decoding throughput up to three times, through progressive block merging, stage‑wise diffusion distillation, and engineering optimizations validated on multiple benchmarks.

BARD-VLMultimodalbenchmark
0 likes · 9 min read
BARD-VL Achieves New SOTA for Multimodal Diffusion Models via Autoregressive‑Diffusion Bridge
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 4, 2026 · Artificial Intelligence

How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI

A review of CVPR 2026 papers shows a shift in visual generative AI from incremental performance gains within established frameworks to a systematic rewrite of default modeling assumptions, covering new guidance mechanisms, video generation architectures, direct image prediction, fine‑grained motion control, and dense semantic correspondence.

Video Generationdiffusiongenerative AI
0 likes · 13 min read
How CVPR 2026 Is Redefining Visual Model Defaults in Generative AI
DaTaobao Tech
DaTaobao Tech
Apr 22, 2026 · Artificial Intelligence

How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds

MNN‑Sana‑Edit‑V2, a collaborative effort between Taobao’s Meta team and Hangzhou University, combines a frozen Qwen3‑0.6B LLM, Learnable Query, Connector, Linear DiT and Deep Compression Autoencoder with 4/8‑bit quantization to run fully on mobile devices, delivering 512×512 comic‑style conversions in about 15 seconds—2.5× faster than cloud alternatives—while providing open‑source code, detailed training stages, and extensive performance benchmarks.

Image GenerationMobile AIModel Quantization
0 likes · 13 min read
How MNN‑Sana‑Edit‑V2 Brings Comic‑Style Image Editing to Your Phone in 15 seconds
vivo Internet Technology
vivo Internet Technology
Mar 18, 2026 · Artificial Intelligence

How Ada-RefSR Eliminates Hallucinations in Single‑Step Diffusion Super‑Resolution

This article presents Ada-RefSR, a novel single‑step diffusion‑based reference super‑resolution framework that introduces a "Trust but Verify" paradigm, adaptive implicit correlation gating, and lightweight architecture to robustly suppress hallucinations and achieve state‑of‑the‑art performance on multiple benchmarks, while being suitable for mobile deployment.

Ada-RefSRICLR2026Image Restoration
0 likes · 10 min read
How Ada-RefSR Eliminates Hallucinations in Single‑Step Diffusion Super‑Resolution
AIWalker
AIWalker
Mar 4, 2026 · Artificial Intelligence

Drifting Models Enable One‑Step Generation, Shattering Speed Records

The paper introduces Drifting Models, a new generative paradigm that moves the distribution evolution to the training phase, achieving true one‑step (1‑NFE) generation with state‑of‑the‑art ImageNet FID scores of 1.54 in latent space and 1.61 in pixel space, while eliminating the need for distillation or classifier‑free guidance.

Drifting ModelsGenerative ModelingImageNet
0 likes · 24 min read
Drifting Models Enable One‑Step Generation, Shattering Speed Records
Data Party THU
Data Party THU
Feb 15, 2026 · Artificial Intelligence

Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing

FireRed-Image-Edit, the latest open‑source image‑editing model from the Xiaohongshu Super Intelligence team, outperforms existing benchmarks with superior instruction understanding, ID preservation and efficient architecture, thanks to its RedEdit Bench evaluation suite, a three‑stage training pipeline and a scalable data‑engine.

AI Image EditingFireRed-Image-EditModel Evaluation
0 likes · 8 min read
Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2026 · Artificial Intelligence

How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips

GLM-Image, a Chinese text‑to‑image model trained end‑to‑end on Huawei Ascend 800T A2 NPUs, combines an autoregressive decoder with a diffusion encoder, supports resolutions up to 2048×2048, and offers open‑source code, API access, and detailed prompts that demonstrate its strong layout and typography capabilities.

GLM-ImageHuawei Ascenddiffusion
0 likes · 12 min read
How GLM-Image Generates High‑Quality Text‑to‑Image on Huawei Ascend Chips
Meituan Technology Team
Meituan Technology Team
Jan 8, 2026 · Artificial Intelligence

Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More

This article curates eight AAAI 2026 papers authored by the Meituan research team, covering verifiable stepwise rewards for LLM reasoning, annealing strategies in large‑scale training, process reward models, competence‑difficulty sampling, high‑fidelity visual text rendering, counterfactual fusion, compress‑then‑rank reranking, and cross‑modal quantization for generative recommendation, with direct PDF links for each work.

AAAI2026CounterfactualLLM
0 likes · 14 min read
Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More
Kuaishou Tech
Kuaishou Tech
Jan 8, 2026 · Artificial Intelligence

Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research

Kuaishou secured 12 papers at AAAI 2026, covering advances in search and recommendation systems, multi‑camera video generation, multimodal understanding, generative model fundamentals, video large language models, experimental design, and LLM latent‑space reasoning, with three papers highlighted as oral presentations.

LLMVideo Generationai
0 likes · 22 min read
Top 12 Kuaishou Papers Accepted at AAAI 2026: Breakthroughs in Recommendation, Video Generation, and LLM Research
Kuaishou Tech
Kuaishou Tech
Sep 17, 2025 · Artificial Intelligence

How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation

The MIDAS framework introduced by the Kling Team combines autoregressive video generation with a lightweight diffusion denoising head to deliver real‑time, high‑quality digital‑human synthesis under multimodal control, achieving sub‑500 ms latency, 64× compression, and robust performance across multilingual dialogue, singing, and interactive world modeling tasks.

Digital HumanReal-time Videoai
0 likes · 6 min read
How MIDAS Achieves Real‑Time Multimodal Digital‑Human Video Generation
AI Frontier Lectures
AI Frontier Lectures
Jun 9, 2025 · Artificial Intelligence

How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing

The article introduces DiSA, a training‑free diffusion step annealing technique that dramatically speeds up autoregressive image generation by reducing diffusion steps in later generation phases while preserving high visual quality, and validates the method across several state‑of‑the‑art AR‑Diffusion models.

AI researchDiSAImage Generation
0 likes · 16 min read
How DiSA Accelerates Autoregressive Image Generation with Diffusion Step Annealing
AI Algorithm Path
AI Algorithm Path
Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchText GenerationTransformer
0 likes · 17 min read
Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions
AIWalker
AIWalker
Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Inference ScalingModel Scalingdiffusion
0 likes · 17 min read
How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA
DaTaobao Tech
DaTaobao Tech
Mar 12, 2025 · Artificial Intelligence

Multimodal Automatic Layout Generation for E-commerce

The project develops a multimodal automatic layout generation system for e‑commerce by fine‑tuning the qwen‑vl‑7b vision‑language model with LoRA on poster and Taobao image‑layout data, employing diffusion‑based image generation and coordinate‑prediction methods to produce structured layouts that power poster, marketing image, and video‑cover creation with over 90% adoption, while exploring multi‑image, style‑aware, and iterative refinement extensions.

LLMMultimodal AIdiffusion
0 likes · 12 min read
Multimodal Automatic Layout Generation for E-commerce
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchImage GenerationLanguage Modeling
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
AIWalker
AIWalker
Feb 13, 2025 · Artificial Intelligence

How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute

FlashVideo introduces a two‑stage framework that first generates low‑resolution videos with strong prompt fidelity and then uses flow‑matching ODE trajectories to upscale to 4K quality in just four function evaluations, achieving top VBench‑Long scores while cutting generation time by up to five‑fold.

FlashVideoVideo Generationai
0 likes · 26 min read
How FlashVideo Turns Low‑Res Clips into 4K Video with Minimal Compute
AIWalker
AIWalker
Feb 6, 2025 · Artificial Intelligence

FluxSR: The First 12B‑Parameter Single‑Step Diffusion Model for Real‑World Super‑Resolution

FluxSR introduces a novel single‑step diffusion approach for real‑world image super‑resolution built on the 12‑billion‑parameter FLUX.1‑dev model, employing Flow‑Trajectory Distillation, TV‑LPIPS and attention‑diversity losses to achieve high fidelity, reduced artifacts, and lower memory and compute costs.

Flow DistillationImage Restorationdiffusion
0 likes · 16 min read
FluxSR: The First 12B‑Parameter Single‑Step Diffusion Model for Real‑World Super‑Resolution
Alimama Tech
Alimama Tech
Dec 4, 2024 · Artificial Intelligence

AIGB: Generative Auto‑Bidding via Diffusion Modeling

AIGB, introduced by Alibaba Mama in 2023, reframes large‑scale ad‑auction auto‑bidding as a generative sequence task using diffusion models, achieving up to 5 % GMV gains, improved stability and interpretability, and is now commercialized, open‑sourced, and featured in a NeurIPS‑endorsed competition.

Generative ModelsReinforcement Learningai
0 likes · 12 min read
AIGB: Generative Auto‑Bidding via Diffusion Modeling
NewBeeNLP
NewBeeNLP
Dec 2, 2024 · Artificial Intelligence

What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?

This article surveys current unified generation-and-understanding multimodal large-model architectures, compares LLM-centric and LLM-plus-diffusion designs, extracts common insights, details large-scale training tricks from models like Emu3, Chameleon and Janus, and outlines open research directions for visual encoders.

Large Language ModelsMultimodaldiffusion
0 likes · 5 min read
What Are Today’s Unified Generation-and-Understanding Multimodal Model Architectures?
DaTaobao Tech
DaTaobao Tech
Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNMobile AI
0 likes · 15 min read
MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment
DataFunTalk
DataFunTalk
May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCMultimodalOPPO
0 likes · 18 min read
Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
DataFunSummit
DataFunSummit
May 6, 2024 · Artificial Intelligence

Advances, Model Types, and Open Challenges of AI‑Generated Content (AIGC) with XiaoBu’s Image Generation Progress

This article reviews the definition, key metrics, and major model families of AI‑generated content, details XiaoBu’s recent breakthroughs in image generation, and discusses open research problems such as evaluation gaps, transformer limitations, and the need for richer multimodal intelligence representations.

AIGCGANGenerative Models
0 likes · 14 min read
Advances, Model Types, and Open Challenges of AI‑Generated Content (AIGC) with XiaoBu’s Image Generation Progress
JD Cloud Developers
JD Cloud Developers
Apr 25, 2024 · Artificial Intelligence

How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation

This article presents JD Advertising's 2023 innovations that combine relation‑aware diffusion models, category‑aware background generation, and planning‑and‑rendering pipelines to automatically produce high‑quality, scalable, and personalized e‑commerce ad posters, addressing efficiency, cost, and creative limitations of manual design.

Advertisingaidiffusion
0 likes · 18 min read
How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 22, 2024 · Artificial Intelligence

Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model

This article provides a detailed, step‑by‑step technical analysis of OpenAI's Sora text‑to‑video system, exploring its overall architecture, visual encoder‑decoder choices, Spacetime Latent Patch design, transformer‑based diffusion model, training strategies, and long‑time consistency mechanisms while referencing relevant research papers and open‑source techniques.

Soraaidiffusion
0 likes · 50 min read
Unveiling Sora: How OpenAI Might Build Its Groundbreaking Text‑to‑Video Model
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation
0 likes · 8 min read
How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Artificial Intelligence

Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications

OpenAI’s Sora video model unifies image and video data into latent spacetime patches via a VAE, trains on original resolutions with GPT‑4‑expanded captions, employs a Diffusion Transformer backbone for patch‑wise denoising, and demonstrates 3D‑consistent, long‑term world‑model capabilities that hint at a unified computer‑vision paradigm and steps toward AGI.

AI researchOpenAI SoraTransformer
0 likes · 9 min read
Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications
Model Perspective
Model Perspective
Nov 19, 2023 · Fundamentals

How Diffusion Models Explain Everyday Phenomena and Environmental Risks

This article introduces the fundamental concepts and mathematical description of diffusion, explores its wide-ranging applications from daily life to environmental engineering, and demonstrates its use through a detailed ink‑in‑water example and a lake‑spill case study.

Fick's lawPhysicsdiffusion
0 likes · 10 min read
How Diffusion Models Explain Everyday Phenomena and Environmental Risks
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 26, 2023 · Artificial Intelligence

Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide

This article introduces the upgraded PAI‑Diffusion Chinese models, highlighting major improvements in image quality and style diversity, detailing lightweight fine‑tuning methods such as LoRA and Textual Inversion, showcasing controllable editing, scenario‑specific customization, and providing step‑by‑step usage instructions on popular platforms.

LoRATextual Inversionai
0 likes · 14 min read
Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 14, 2023 · Artificial Intelligence

Why Large Models Are Revolutionizing AI: From Foundations to AIGC

This article explores the concept and evolution of large foundation models, their transformative impact on AI-generated content, the underlying technologies such as transformers, diffusion, and CLIP, and discusses the challenges, emerging abilities, and future prospects of these models across multiple modalities.

AIGCGPTdiffusion
0 likes · 32 min read
Why Large Models Are Revolutionizing AI: From Foundations to AIGC