Tagged articles
158 articles
Page 2 of 2
AIWalker
AIWalker
Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismStyle Transferdiffusion models
0 likes · 10 min read
ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua
AIWalker
AIWalker
Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Mobile AISnapGendiffusion models
0 likes · 23 min read
SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture
DaTaobao Tech
DaTaobao Tech
Dec 30, 2024 · Artificial Intelligence

AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation

This article compiles recent AI research breakthroughs presented at AAAI 2025 and NeurIPS 2024, summarizing eight papers on multi‑condition image generation, mixed auto‑regressive models, hallucination mitigation in vision‑language models, quantized diffusion denoising, facial part swapping, language‑guided concept vectors, attribution consistency, and video virtual try‑on, with links to each work.

AAAI 2025AI researchGenerative Models
0 likes · 13 min read
AI Research Highlights: AAAI 2025 & NeurIPS 2024 Breakthroughs in Image Generation
Bilibili Tech
Bilibili Tech
Dec 24, 2024 · Artificial Intelligence

AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark

AniSora combines a 10‑million‑pair anime text‑video dataset, a controllable diffusion‑transformer with temporal‑mask conditioning for text‑to‑video, interpolation and region‑guided animation, and a 948‑video benchmark, delivering industry‑leading character and motion consistency and already powering low‑cost dynamic‑comic production for multiple IPs.

AI_AnimationDataset BenchmarkTemporal Masking
0 likes · 21 min read
AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark
DaTaobao Tech
DaTaobao Tech
Dec 16, 2024 · Artificial Intelligence

Reference Image Generation for Subject‑Driven Diffusion

This work presents a subject‑driven diffusion pipeline that injects multi‑scale reference features (ReferenceNet‑style) into high‑fidelity backbones such as SD‑XL and Flux, enabling zero‑shot, fine‑grained product consistency across diverse scenes and outperforming current fine‑tuned and zero‑shot methods while noting limits in category coverage and human interactions.

AIDreamBoothIP-Adapter
0 likes · 9 min read
Reference Image Generation for Subject‑Driven Diffusion
AntTech
AntTech
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

EchoMimicV2, an open‑source project from Ant Group's Alipay AI team, introduces an end‑to‑end audio‑driven framework that generates high‑quality semi‑body portrait videos by jointly coordinating audio, pose, and image inputs, while addressing challenges of condition complexity, model stability, and computational cost.

Digital Humanaudio-driven animationdiffusion models
0 likes · 16 min read
EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework
Alimama Tech
Alimama Tech
Nov 27, 2024 · Artificial Intelligence

FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution

FlowDCN introduces Groupwise‑MSDCN, a sparse deformable convolution that replaces attention, enabling efficient arbitrary‑resolution image generation with linear complexity, fewer parameters and FLOPs, and achieving state‑of‑the‑art FID scores on ImageNet while requiring far fewer training steps.

Deformable Convolutionarbitrary resolutiondiffusion models
0 likes · 12 min read
FlowDCN: Efficient Arbitrary-Resolution Image Generation via Groupwise Multi‑Scale Deformable Convolution
Alipay Experience Technology
Alipay Experience Technology
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs

EchoMimicV2 is an open‑source digital‑human framework that generates high‑quality half‑body animation videos from a single reference image, an audio clip, and a hand‑gesture sequence, addressing challenges of facial portrait limits, complex condition injection, and inference latency in audio‑driven animation.

AI researchDigital HumanVideo Generation
0 likes · 18 min read
EchoMimicV2: High‑Quality Audio‑Driven Half‑Body Human Animation with Simple Inputs
JD Tech
JD Tech
Nov 15, 2024 · Artificial Intelligence

Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation

The paper proposes a multimodal Reliable Feedback Network (RFNet) and a consistency‑regularized fine‑tuning method (RFFT) that dramatically increase the proportion of usable advertising images generated by diffusion models while preserving visual appeal, and introduces the large‑scale RF1M dataset for training and evaluation.

RFNetadvertising imagesdiffusion models
0 likes · 9 min read
Reliable Feedback Network (RFNet) for Improving Usable Advertising Image Generation
JD Retail Technology
JD Retail Technology
Nov 14, 2024 · Artificial Intelligence

Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)

The paper introduces a Multimodal Reliable Feedback Network (RFNet) and a consistency‑condition regularization technique that together boost the usable rate of automatically generated advertisement images while preserving visual quality, supported by a new million‑image annotated dataset and extensive ECCV‑2024 experiments.

AIECCV2024advertisement generation
0 likes · 8 min read
Improving Advertisement Image Generation with a Multimodal Reliable Feedback Network (ECCV 2024)
JD Tech Talk
JD Tech Talk
Nov 14, 2024 · Artificial Intelligence

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

This paper presents a multimodal Reliable Feedback Network (RFNet) and a consistency regularization method that use human feedback to automatically evaluate and fine‑tune diffusion models, dramatically increasing the usable rate of e‑commerce advertising images while preserving visual quality.

Computer VisionHuman FeedbackRFNet
0 likes · 8 min read
Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet
JD Cloud Developers
JD Cloud Developers
Nov 14, 2024 · Artificial Intelligence

Boosting Advertising Image Generation Reliability with Human Feedback

This paper presents a multimodal Trustworthy Feedback Network (RFNet) and a consistency regularization method that use human feedback to dramatically improve the usability and visual quality of automatically generated e‑commerce advertising images while reducing manual inspection costs.

AIHuman FeedbackReliability
0 likes · 9 min read
Boosting Advertising Image Generation Reliability with Human Feedback
NewBeeNLP
NewBeeNLP
Nov 11, 2024 · Artificial Intelligence

Inside MIT’s Deep Generative Models Course: Topics, Schedule, and Resources

MIT’s 6.S978 Deep Generative Models seminar, taught by Associate Professor He Kaiming, offers graduate students a 15‑week deep dive into VAEs, autoregressive models, GANs, diffusion techniques, and cross‑disciplinary applications, with detailed weekly topics, required assignments, and publicly available lecture PDFs.

Deep Generative ModelsGANHe Kaiming
0 likes · 5 min read
Inside MIT’s Deep Generative Models Course: Topics, Schedule, and Resources
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPComputer Vision
0 likes · 19 min read
Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 16, 2024 · Artificial Intelligence

How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud AI Platform PAI and South China University of Technology at ACM MM 2024, leverages linguistic dependency parsing to guide cross‑attention in Stable Diffusion, enabling accurate, training‑free multi‑object image editing while preserving spatial structure and achieving state‑of‑the‑art results on benchmark datasets.

AI researchStable DiffusionVICTORIA
0 likes · 10 min read
How VICTORIA Revolutionizes Multi‑Object Image Editing with Language‑Aware Diffusion
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 15, 2024 · Artificial Intelligence

How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion

The VICTORIA algorithm, presented by Alibaba Cloud's PAI team at ACM MM2024, leverages linguistic dependency parsing and cross‑attention control to overcome multi‑object editing challenges in training‑free text‑guided image editing, delivering precise, structure‑preserving results across diverse scenes.

AI researchdiffusion modelsimage manipulation
0 likes · 6 min read
How VICTORIA Boosts Text‑Guided Image Editing with Language‑Aware Diffusion
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDistillationdiffusion models
0 likes · 9 min read
Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
Alimama Tech
Alimama Tech
Aug 16, 2024 · Artificial Intelligence

SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling

SPLAM (Sub‑Path Linear Approximation Model) accelerates diffusion‑model image synthesis by linearly approximating short sub‑paths of the probability‑flow ODE, allowing high‑quality generation in as few as four steps, outperforming prior fast‑sampling methods on COCO benchmarks and being deployed in Alibaba Mama’s recommendation system.

AI image generationSPLAMdiffusion models
0 likes · 11 min read
SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 11, 2024 · Artificial Intelligence

Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing

Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.

AI researchRetrieval Augmented Generationdiffusion models
0 likes · 11 min read
Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing
Kuaishou Tech
Kuaishou Tech
Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsModel Evaluation
0 likes · 27 min read
Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications
AntTech
AntTech
Jul 24, 2024 · Artificial Intelligence

EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation

EchoMimic, an open‑source project from Ant Group, presents a flexible, audio‑ and pose‑driven digital human generation pipeline that combines 2D, 3D and AIGC techniques, reduces production costs, achieves real‑time inference, and includes a detailed architecture, related work analysis, and future research directions.

AIGCDigital Humanaudio-driven animation
0 likes · 18 min read
EchoMimic: An Open‑Source AIGC‑Driven Framework for 2D/3D Digital Human Generation
Kuaishou Large Model
Kuaishou Large Model
Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIComputer VisionI2V-Adapter
0 likes · 9 min read
How I2V-Adapter Turns Images into Videos with Minimal Training
Kuaishou Tech
Kuaishou Tech
Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIComputer VisionI2V-Adapter
0 likes · 8 min read
I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models
Huolala Tech
Huolala Tech
May 23, 2024 · Artificial Intelligence

How to Detect and Remove Moiré Patterns with AI and Diffusion Models

This article explains the nature of moiré patterns in digital imaging, reviews manual mitigation techniques, introduces direct and indirect AI‑based recognition methods—including traditional feature extraction and deep‑learning models such as CNNs and diffusion frameworks—and details practical applications and evaluation metrics used by Huolala.

AIComputer VisionDeep Learning
0 likes · 17 min read
How to Detect and Remove Moiré Patterns with AI and Diffusion Models
DataFunTalk
DataFunTalk
May 3, 2024 · Artificial Intelligence

Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora

This article reviews the rapid progress of text‑to‑video generation, explains diffusion‑based video synthesis, outlines key technical challenges such as motion modeling, semantic alignment and quality, and presents Tencent’s solutions and real‑world applications, while also discussing future directions and the impact of OpenAI’s Sora model.

AISoraTencent
0 likes · 23 min read
Advances, Challenges, and Industrial Practices in Text‑to‑Video Generation – From Diffusion Models to Sora
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models
0 likes · 10 min read
Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation
JD Tech
JD Tech
Apr 29, 2024 · Artificial Intelligence

Relation-Aware Diffusion Models for Automated Poster Layout and Product Background Generation

This article presents JD Advertising's 2023 AI-driven framework that uses a relation‑aware diffusion model with visual‑text and geometric modules, combined with category‑common and personalized generators and a planning‑and‑rendering network, to automate high‑quality, scalable e‑commerce poster creation and background synthesis.

Multimodal AIdiffusion modelse-commerce advertising
0 likes · 18 min read
Relation-Aware Diffusion Models for Automated Poster Layout and Product Background Generation
Xiaohe Frontend Team
Xiaohe Frontend Team
Apr 21, 2024 · Artificial Intelligence

What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More

The article reviews the latest breakthroughs in generative AI, including Microsoft’s VASA‑1 video synthesis model, Meta’s open‑source Llama‑3 large language model, Stability AI’s Stable Diffusion 3 API, Adobe’s integration of third‑party AI video tools into Premiere Pro, and a free image‑style‑recreation platform from Freepik, highlighting their technical details and potential applications.

AI toolsdiffusion modelsgenerative AI
0 likes · 13 min read
What’s New in Generative AI? VASA‑1, Llama‑3, Stable Diffusion 3 & More
DaTaobao Tech
DaTaobao Tech
Mar 25, 2024 · Artificial Intelligence

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

AIAIGCVideo Editing
0 likes · 13 min read
Survey of AIGC Video Generation Algorithms
Architect
Architect
Feb 22, 2024 · Artificial Intelligence

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

The article provides a comprehensive technical overview of OpenAI’s Sora text‑to‑video model, explaining its background, underlying diffusion‑Transformer architecture, key breakthroughs, potential industry impacts, success factors, limitations, and future prospects for AI‑generated video content.

AIOpenAISora
0 likes · 15 min read
Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 19, 2024 · Artificial Intelligence

Technical Review of OpenAI's Sora Video Generation Model

This article reviews OpenAI's Sora video generation model, summarizing its technical report, key innovations such as patch-based visual tokens, compression networks, scaling transformers, language understanding, and discussing its capabilities, highlights, and current limitations in AI video synthesis.

AIOpenAISora
0 likes · 9 min read
Technical Review of OpenAI's Sora Video Generation Model
21CTO
21CTO
Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchSoraTransformer
0 likes · 19 min read
How OpenAI’s Sora Is Pushing Video Generation to New Frontiers
Architect
Architect
Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI SafetyOpenAISora
0 likes · 12 min read
Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 9, 2024 · Artificial Intelligence

How InstantID Generates High‑Fidelity Holiday Portraits in 30 Seconds

InstantID is a plug‑in adapter that adds identity‑preserving capabilities to text‑to‑image diffusion models, allowing users to upload a single photo and, within 30 seconds, produce a Spring Festival‑styled portrait with accurate facial features, customizable prompts, and strong text control.

AI image generationHugging FaceInstantID
0 likes · 7 min read
How InstantID Generates High‑Fidelity Holiday Portraits in 30 Seconds
Alimama Tech
Alimama Tech
Jan 24, 2024 · Artificial Intelligence

Hierarchical Masked 3D Diffusion Model for Video Outpainting

The Hierarchical Masked 3D Diffusion Model (M3DDM) introduces a masking‑based training strategy and cross‑attention with global video clips to achieve temporally consistent video outpainting, while a hybrid coarse‑to‑fine inference pipeline mitigates error accumulation, delivering state‑of‑the‑art results and deployment in Alibaba’s creative center.

3D diffusionACM MM2023AI video processing
0 likes · 12 min read
Hierarchical Masked 3D Diffusion Model for Video Outpainting
Baidu Geek Talk
Baidu Geek Talk
Dec 19, 2023 · Industry Insights

Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks

The second Baidu Search Innovation Contest attracted over 2,800 participants from 45 regions, featured five AI‑focused tracks, and highlighted champion teams that employed techniques such as Lora‑fine‑tuned LLMs, vector‑intersection Top‑K search, GPU‑optimized algorithms, and diffusion‑based image generation to push the boundaries of search technology.

AI competitionGPU OptimizationLLM fine-tuning
0 likes · 12 min read
Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks
DaTaobao Tech
DaTaobao Tech
Dec 4, 2023 · Artificial Intelligence

AIGC Poster Generation Project: Methods and Optimizations

The AIGC Poster Generation Project employs Stable Diffusion enhanced with VAE, ControlNet, LoRA and other extensions to create product posters in four visual styles, exploring outpainting, inpainting, reference‑based diffusion and DreamBooth prototypes, and optimizes detail preservation, super‑resolution text, and masking to achieve over 90% detail fidelity, 95% success rate, and 3–5 second inference per image.

AIGCControlNetStable Diffusion
0 likes · 7 min read
AIGC Poster Generation Project: Methods and Optimizations
JD Retail Technology
JD Retail Technology
Nov 23, 2023 · Artificial Intelligence

Recent Advances in Advertising Recommendation Algorithms and Their Applications

This article reviews recent progress in advertising recommendation technologies, covering deep learning‑based ranking, sequence modeling, self‑supervised learning, online and reinforcement learning, multimodal recommendation, and fairness, and details four key breakthroughs—data‑driven incremental learning, dynamic group parameter modeling, bilateral interactive graph convolution, and a relation‑aware diffusion model for poster layout generation, along with experimental results and future challenges.

Deep LearningIncremental Learningadvertising recommendation
0 likes · 25 min read
Recent Advances in Advertising Recommendation Algorithms and Their Applications
DataFunTalk
DataFunTalk
Nov 15, 2023 · Artificial Intelligence

Contextual Learning for Personalized Text‑to‑Image Generation

This article explains how contextual learning can enhance text‑to‑image models by incorporating example image‑text pairs, redesigning the UNet architecture, building large in‑context training datasets, and training the SuTI model to achieve fast, controllable, and high‑quality personalized image generation.

AIcontextual learningdiffusion models
0 likes · 24 min read
Contextual Learning for Personalized Text‑to‑Image Generation
DataFunSummit
DataFunSummit
Nov 4, 2023 · Artificial Intelligence

AIGC Generation Models and Diffusion‑Based Planning for Embodied AI

This article explores powerful AIGC generation models and large language models like ChatGPT, detailing how diffusion models can be applied to robotic planning, introducing AdaptDiffuser, self‑evolving data generation, and embodied AI challenges, while summarizing recent research and practical implementations.

AIGCAdaptDiffuserEmbodied AI
0 likes · 20 min read
AIGC Generation Models and Diffusion‑Based Planning for Embodied AI
Bilibili Tech
Bilibili Tech
Nov 1, 2023 · Artificial Intelligence

Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications

Professor Hu Qiang presented recent progress in Neural Radiance Fields—covering implicit/explicit representations, hybrid models, and solutions for dynamic scenes, cloud‑based and edge‑cloud rendering—while also reviewing generative AI advances such as diffusion‑based text‑to‑image/video/3D, LoRA fine‑tuning, and large‑scale story‑book datasets, highlighting applications in virtual‑real content, smart‑city modeling, and 6‑DoF e‑commerce displays.

3D reconstructionNeRFdiffusion models
0 likes · 14 min read
Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 24, 2023 · Artificial Intelligence

Accelerating Diffusion Model Sampling with OLSS: A Linear Subspace Approach

This article presents the OLSS (Optimal Linear Subspace Search) algorithm, a novel diffusion‑model sampling accelerator that models acceleration as a linear subspace expansion, unifies existing methods, introduces trainable scheduler coefficients solved via least‑squares, and demonstrates significant speed and quality gains on Stable Diffusion benchmarks.

AIdiffusion modelsimage generation
0 likes · 8 min read
Accelerating Diffusion Model Sampling with OLSS: A Linear Subspace Approach
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 23, 2023 · Artificial Intelligence

How the New OLSS Algorithm Supercharges Diffusion Model Sampling

The article announces that Alibaba Cloud’s AI platform PAI and ECNU researchers’ paper on the Optimal Linear Subspace Search (OLSS) algorithm was selected for CIKM 2023, explains how OLSS accelerates diffusion‑model sampling by operating in higher‑dimensional linear subspaces, and provides details of the paper and its visual results.

OLSSdiffusion modelsgenerative AI
0 likes · 5 min read
How the New OLSS Algorithm Supercharges Diffusion Model Sampling
AntTech
AntTech
Sep 27, 2023 · Artificial Intelligence

DiffUTE: A Universal Multilingual Text Editing Diffusion Model for High-Fidelity Image Text Manipulation

The article presents DiffUTE, an end‑to‑end self‑supervised multilingual text‑editing diffusion model that leverages fine‑grained position and glyph guidance together with large language model control to achieve high‑quality, high‑fidelity text modifications in images, and demonstrates its effectiveness through extensive experiments and real‑world deployments at Ant Group.

AIAIGCText Editing
0 likes · 8 min read
DiffUTE: A Universal Multilingual Text Editing Diffusion Model for High-Fidelity Image Text Manipulation
58UXD
58UXD
Sep 21, 2023 · Artificial Intelligence

How AI is Redefining Design: From Midjourney to Stable Diffusion

This article explores the rise of AI‑driven design, examines real‑world projects such as AI‑Soup, virtual jewelry, and e‑commerce checkpoints, explains diffusion model fundamentals, compares major tools, and discusses the opportunities and risks for creators and businesses adopting generative AI.

AI designcreative workflowdiffusion models
0 likes · 13 min read
How AI is Redefining Design: From Midjourney to Stable Diffusion
Alimama Tech
Alimama Tech
Aug 2, 2023 · Artificial Intelligence

Can AI Fully Automate Advertising Poster Creation and Video Outpainting?

This article reviews four ACM MM 2023 papers that introduce AI‑driven systems for automatic advertising poster generation, multimodal text‑image creation, few‑shot style‑guided visual captioning, and hierarchical 3D diffusion models for video outpainting, detailing their methods, datasets, and experimental results.

AI-generated designPoster Automationdiffusion models
0 likes · 9 min read
Can AI Fully Automate Advertising Poster Creation and Video Outpainting?
Top Architect
Top Architect
May 8, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Architecture, Training, and Practical Applications

This article provides a comprehensive overview of Stable Diffusion, covering its latent diffusion architecture, training data and procedures, model components such as autoencoder, CLIP text encoder and UNet, as well as practical usage examples including text‑to‑image generation, image‑to‑image, inpainting, and advanced extensions like ControlNet and SD‑2.x.

AI image generationStable Diffusiondiffusion models
0 likes · 52 min read
Understanding Stable Diffusion: Architecture, Training, and Practical Applications
Tencent Cloud Developer
Tencent Cloud Developer
Apr 10, 2023 · Artificial Intelligence

How Computers Generate Realistic Images: An In‑Depth Guide to AI Image Generation, Diffusion Models, ControlNet, LoRA and More

This guide explains how AI creates photorealistic images, tracing the shift from VAEs and GANs to diffusion models, detailing latent diffusion, ControlNet conditioning, CLIP text‑image alignment, and lightweight fine‑tuning methods like DreamBooth and LoRA, plus practical tips for higher‑resolution results.

AI image generationControlNetLoRA
0 likes · 22 min read
How Computers Generate Realistic Images: An In‑Depth Guide to AI Image Generation, Diffusion Models, ControlNet, LoRA and More
Laiye Technology Team
Laiye Technology Team
Mar 3, 2023 · Artificial Intelligence

Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet

This article reviews the key components and design choices of recent text‑controlled image generation systems—including DALL·E‑2, Google Imagen, Stability AI's Latent Stable Diffusion, and the ControlNet extension—highlighting how diffusion models, text encoders, prior modules, super‑resolution, and conditioning mechanisms enable high‑quality, controllable visual synthesis.

AIControlNetDALL-E-2
0 likes · 16 min read
Survey of Text‑Controlled Image Generation Models: DALL·E‑2, Imagen, Stable Diffusion, and ControlNet
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 12, 2022 · Artificial Intelligence

Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models

This article introduces Alibaba Cloud’s open‑source PAI‑Diffusion series, detailing its Latent Diffusion Model foundation, Chinese CLIP alignment, super‑resolution components, and showcases diverse artistic and real‑world text‑to‑image generation scenarios, while providing guidance on accessing the models via Alibaba Cloud AI Center, PAI‑DSW, and HuggingFace Space.

Alibaba CloudChinese AIdiffusion models
0 likes · 11 min read
Unlocking Chinese Text-to-Image Generation with Alibaba’s PAI‑Diffusion Models
Code DAO
Code DAO
May 26, 2022 · Artificial Intelligence

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.

Generative ModelsMarkov chainNeural Networks
0 likes · 8 min read
Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process